Prechádzať zdrojové kódy

git: merge branch 'main' into dev-guide

Rongxin 1 rok pred
rodič
commit
cf4ac8db6d

+ 6 - 3
README.md

@@ -35,7 +35,8 @@ Feel free to provide feedback in [GitHub Issues](https://github.com/Byaidu/PDFMa
 
 <h2 id="updates">Updates</h2>
 
-- [Nov. 20 2024] GUI now supports specifying Ollama models *(by [@IuvenisSapiens](https://github.com/IuvenisSapiens))*  
+- [Nov. 21 2024] GUI now supports downloading dual-document *(by [@reycn](https://github.com/reycn))*  
+- [Nov. 20 2024] GUI now supports specifying Ollama and OpenAI models *(by [@IuvenisSapiens](https://github.com/IuvenisSapiens), [@Byaidu](https://github.com/Byaidu))*  
 - [Nov. 20 2024] 🌟 [Demo](#demo)  online! *(by [@reycn](https://github.com/reycn))*  
 - [Nov. 20 2024] Supports [Docker](#docker) *(by [@Byaidu](https://github.com/Byaidu))*  
 - [Nov. 20 2024] Supports [multiple-threads translation](#threads) *(by [@Byaidu](https://github.com/Byaidu))*  
@@ -55,7 +56,7 @@ Note that the computing resources of the demo are limited, so please avoid abusi
 
 <h2 id="install">Installation and Usage</h2>
 
-We provide three methods for using this project: [commanline](#cmd), [GUI](#gui), and [Docker](#docker).
+We provide three methods for using this project: [Commandline](#cmd), [GUI](#gui), and [Docker](#docker).
 
 <h3 id="cmd">Method I. Commandline</h3>
 
@@ -266,12 +267,14 @@ pdf2zh example.pdf -t 1
   <img src="https://opencollective.com/PDFMathTranslate/contributors.svg?width=890&button=false" />
 </a>
 
+![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")
+
 <h2 id="star_hist">Star History</h2>
 
 <a href="https://star-history.com/#Byaidu/PDFMathTranslate&Date">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date&theme=dark" />
    <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" />
-   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" width="70%"/>
+   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date"/>
  </picture>
 </a>

+ 5 - 2
README_zh-CN.md

@@ -35,7 +35,8 @@
 
 <h2 id="updates">近期更新</h2>
 
-- [Nov. 20 2024] 图形用户界面现在支持指定 Ollama 各模型 *(by [@IuvenisSapiens](https://github.com/IuvenisSapiens))*  
+- [Nov. 21 2024] 图形用户界面现在支持下载双语文档 *(by [@reycn](https://github.com/reycn))*  
+- [Nov. 20 2024] 图形用户界面现在支持指定 Ollama 和 OpenAI 的模型 *(by [@IuvenisSapiens](https://github.com/IuvenisSapiens), [@Byaidu](https://github.com/Byaidu))*  
 - [Nov. 20 2024] 🌟 提供了 [在线演示](#demo)! *(by [@reycn](https://github.com/reycn))*  
 - [Nov. 20 2024] 支持 [容器化部署](#docker) *(by [@Byaidu](https://github.com/Byaidu))*  
 - [Nov. 20 2024] 支持速度更快的 [多线程翻译](#threads) *(by [@Byaidu](https://github.com/Byaidu))*  
@@ -271,12 +272,14 @@ pdf2zh example.pdf -t 1
   <img src="https://opencollective.com/PDFMathTranslate/contributors.svg?width=890&button=false" />
 </a>
 
+![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")
+
 <h2 id="star_hist">星标历史</h2>
 
 <a href="https://star-history.com/#Byaidu/PDFMathTranslate&Date">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date&theme=dark" />
    <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" />
-   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" width="70%"/>
+   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date"/>
  </picture>
 </a>

BIN
docs/images/banner.png


BIN
docs/images/gui.gif


+ 22 - 0
docs/licenses/LICENSE.pdfminer.six

@@ -0,0 +1,22 @@
+Copyright (c) 2004-2016  Yusuke Shinyama <yusuke at shinyama dot jp>
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
+KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
+WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+ 23 - 0
docs/licenses/LICENSE.pyHanko

@@ -0,0 +1,23 @@
+This package contains various elements based on code from the pyHanko project, of which we reproduce the license below.
+
+MIT License
+
+Copyright (c) 2020 Matthias Valvekens 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

+ 1 - 1
pdf2zh/__init__.py

@@ -1,2 +1,2 @@
-__version__ = "1.7.7"
+__version__ = "1.7.8"
 __author__ = "Byaidu"

+ 56 - 1
pdf2zh/gui.py

@@ -141,7 +141,7 @@ def translate(
         print(f"Command completed with return code: {return_code}")
 
         # Check if translation was successful
-        translated_file = temp_path / f"input-{lang_to}.pdf"
+        translated_file = temp_path / "input-zh.pdf" # <= Do not change filename
         dual_file = temp_path / "input-dual.pdf"
         print(f"Files after translation: {os.listdir(temp_path)}")
 
@@ -187,13 +187,67 @@ def translate(
 
 
 # Global setup
+custom_blue = gr.themes.Color(
+    c50="#E8F3FF",
+    c100="#BEDAFF",
+    c200="#94BFFF",
+    c300="#6AA1FF",
+    c400="#4080FF",
+    c500="#165DFF",  # Primary color
+    c600="#0E42D2",
+    c700="#0A2BA6",
+    c800="#061D79",
+    c900="#03114D",
+    c950="#020B33",
+)
+
 with gr.Blocks(
     title="PDFMathTranslate - PDF Translation with preserved formats",
+    theme=gr.themes.Default(
+        primary_hue=custom_blue, spacing_size="md", radius_size="lg"
+    ),
     css="""
     .secondary-text {color: #999 !important;}
     footer {visibility: hidden}
     .env-warning {color: #dd5500 !important;}
     .env-success {color: #559900 !important;}
+    
+    @keyframes pulse-background {
+        0% { background-color: #FFFFFF; }
+        25% { background-color: #FFFFFF; }
+        50% { background-color: #E8F3FF; }
+        75% { background-color: #FFFFFF; }
+        100% { background-color: #FFFFFF; }
+    }
+    
+    /* Add dashed border to input-file class */
+    .input-file {
+        border: 1.2px dashed #165DFF !important;
+        border-radius: 6px !important;
+        # background-color: #ffffff !important;
+        animation: pulse-background 2s ease-in-out;
+        transition: background-color 0.4s ease-out;
+    }
+
+    .input-file:hover {
+        border: 1.2px dashed #165DFF !important;
+        border-radius: 6px !important;
+        color: #165DFF !important;
+        background-color: #E8F3FF !important;
+        transition: background-color 0.2s ease-in;
+    }
+    # .input-file label {
+    #     color: #165DFF !important;
+    #     border: 1.2px dashed #165DFF !important;
+    #     border-left: none !important;
+    #     border-top: none !important;
+    # }
+    # .input-file .wrap {
+    #     color: #165DFF !important;
+    # }
+    # .input-file .or {
+    #     color: #165DFF !important;
+    # }
     """,
 ) as demo:
     gr.Markdown("# PDFMathTranslate")
@@ -206,6 +260,7 @@ with gr.Blocks(
                 file_count="single",
                 file_types=[".pdf"],
                 type="filepath",
+                elem_classes=["input-file"],
             )
             gr.Markdown("## Option")
             service = gr.Dropdown(

+ 39 - 27
pdf2zh/pdf2zh.py

@@ -9,16 +9,17 @@ import argparse
 import logging
 import os
 import sys
+from typing import TYPE_CHECKING, Any, Container, Iterable, List, Optional
+
 import pymupdf
 from huggingface_hub import hf_hub_download
 
 from pdf2zh import __version__
 from pdf2zh.pdfexceptions import PDFValueError
-from typing import Any, Container, Iterable, List, Optional, TYPE_CHECKING
 
 if TYPE_CHECKING:
-    from pdf2zh.utils import AnyIO
     from pdf2zh.layout import LAParams
+    from pdf2zh.utils import AnyIO
 
 OUTPUT_TYPES = ((".htm", "html"), (".html", "html"), (".xml", "xml"), (".tag", "tag"))
 
@@ -69,6 +70,7 @@ def extract_text(
     **kwargs: Any,
 ) -> AnyIO:
     import doclayout_yolo
+
     import pdf2zh.high_level
 
     if not files:
@@ -84,55 +86,64 @@ def extract_text(
     # if not os.path.exists(pth):
     #     print('Downloading...')
     #     urllib.request.urlretrieve("http://huggingface.co/juliozhao/DocLayout-YOLO-DocStructBench/resolve/main/doclayout_yolo_docstructbench_imgsz1024.pt",pth)
-    pth = hf_hub_download(repo_id="juliozhao/DocLayout-YOLO-DocStructBench", filename="doclayout_yolo_docstructbench_imgsz1024.pt")
+    pth = hf_hub_download(
+        repo_id="juliozhao/DocLayout-YOLO-DocStructBench",
+        filename="doclayout_yolo_docstructbench_imgsz1024.pt",
+    )
     model = doclayout_yolo.YOLOv10(pth)
 
     for file in files:
-
         filename = os.path.splitext(os.path.basename(file))[0]
 
         doc_en = pymupdf.open(file)
-        page_count=doc_en.page_count
-        font_list=['china-ss','tiro']
-        font_id={}
+        page_count = doc_en.page_count
+        font_list = ["china-ss", "tiro"]
+        font_id = {}
         for page in doc_en:
             for font in font_list:
-                font_id[font]=page.insert_font(font)
+                font_id[font] = page.insert_font(font)
         xreflen = doc_en.xref_length()
         for xref in range(1, xreflen):
-            for label in ['Resources/','']: # 可能是基于 xobj 的 res
-                try: # xref 读写可能出错
-                    font_res=doc_en.xref_get_key(xref,f'{label}Font')
-                    if font_res[0]=='dict':
+            for label in ["Resources/", ""]:  # 可能是基于 xobj 的 res
+                try:  # xref 读写可能出错
+                    font_res = doc_en.xref_get_key(xref, f"{label}Font")
+                    if font_res[0] == "dict":
                         for font in font_list:
-                            font_exist=doc_en.xref_get_key(xref,f'{label}Font/{font}')
-                            if font_exist[0]=='null':
-                                doc_en.xref_set_key(xref,f'{label}Font/{font}',f'{font_id[font]} 0 R')
+                            font_exist = doc_en.xref_get_key(
+                                xref, f"{label}Font/{font}"
+                            )
+                            if font_exist[0] == "null":
+                                doc_en.xref_set_key(
+                                    xref, f"{label}Font/{font}", f"{font_id[font]} 0 R"
+                                )
                 except:
                     pass
-        doc_en.save(f'{filename}-en.pdf')
+        doc_en.save(f"{filename}-en.pdf")
 
-        with open(f'{filename}-en.pdf', "rb") as fp:
-            obj_patch:dict=pdf2zh.high_level.extract_text_to_fp(fp, **locals())
+        with open(f"{filename}-en.pdf", "rb") as fp:
+            obj_patch: dict = pdf2zh.high_level.extract_text_to_fp(fp, **locals())
 
-        for obj_id,ops_new in obj_patch.items():
+        for obj_id, ops_new in obj_patch.items():
             # ops_old=doc_en.xref_stream(obj_id)
             # print(obj_id)
             # print(ops_old)
             # print(ops_new.encode())
-            doc_en.update_stream(obj_id,ops_new.encode())
+            doc_en.update_stream(obj_id, ops_new.encode())
 
         doc_zh = doc_en
-        doc_dual = pymupdf.open(f'{filename}-en.pdf')
+        doc_dual = pymupdf.open(f"{filename}-en.pdf")
         doc_dual.insert_file(doc_zh)
         for id in range(page_count):
-            doc_dual.move_page(page_count+id,id*2+1)
-        doc_zh.save(f'{filename}-zh.pdf',deflate=1)
-        doc_dual.save(f'{filename}-dual.pdf',deflate=1)
+            doc_dual.move_page(page_count + id, id * 2 + 1)
+        doc_zh.save(f"{filename}-zh.pdf", deflate=1)
+        doc_dual.save(f"{filename}-dual.pdf", deflate=1)
         doc_zh.close()
         doc_dual.close()
-
-        os.remove(f'{filename}-en.pdf')
+        try:  # fix (main): permission error @ https://github.com/Byaidu/PDFMathTranslate/issues/84
+            os.remove(f"{filename}-en.pdf")
+        except Exception as e:
+            print(f"File removal failed due to occupation / not existing, pass.\n{e}")
+            pass
 
     return
 
@@ -249,12 +260,13 @@ def main(args: Optional[List[str]] = None) -> int:
 
     missing_files = check_files(parsed_args.files)
     if missing_files:
-        print(f"The following files do not exist:", file=sys.stderr)
+        print("The following files do not exist:", file=sys.stderr)
         for file in missing_files:
             print(f"  {file}", file=sys.stderr)
         return -1
     if parsed_args.interactive:
         from pdf2zh.gui import setup_gui
+
         setup_gui()
         return 0