Просмотр исходного кода

Merge branch 'main' of https://github.com/Byaidu/PDFMathTranslate

Byaidu 1 год назад
Родитель
Сommit
5431800d4c
3 измененных файлов с 27 добавлено и 9 удалено
  1. 15 0
      .github/FUNDING.yml
  2. 4 2
      README.md
  3. 8 7
      pdf2zh/translator.py

+ 15 - 0
.github/FUNDING.yml

@@ -0,0 +1,15 @@
+# These are supported funding model platforms
+
+github: [Byaidu, reycn, Wybxc, hellofinch] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
+patreon: # Replace with a single Patreon username
+open_collective: # Replace with a single Open Collective username
+ko_fi: # Replace with a single Ko-fi username
+tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
+community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
+liberapay: # Replace with a single Liberapay username
+issuehunt: # Replace with a single IssueHunt username
+lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
+polar: # Replace with a single Polar username
+buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
+thanks_dev: # Replace with a single thanks.dev username
+custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

+ 4 - 2
README.md

@@ -277,7 +277,7 @@ pdf2zh example.pdf -t 1
 
 <h2 id="todo">TODO</h2>
 
-- [ ] Parse layout with [PaddleX](https://github.com/PaddlePaddle/PaddleX/blob/17cc27ac3842e7880ca4aad92358d3ef8555429a/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py#L81), [PaperMage](https://github.com/allenai/papermage/blob/9cd4bb48cbedab45d0f7a455711438f1632abebe/README.md?plain=1#L102), [SAM2](https://github.com/facebookresearch/sam2)
+- [ ] Parse layout with DocLayNet based models, [PaddleX](https://github.com/PaddlePaddle/PaddleX/blob/17cc27ac3842e7880ca4aad92358d3ef8555429a/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py#L81), [PaperMage](https://github.com/allenai/papermage/blob/9cd4bb48cbedab45d0f7a455711438f1632abebe/README.md?plain=1#L102), [SAM2](https://github.com/facebookresearch/sam2)
 
 - [ ] Fix page rotation, table of contents, format of list
 
@@ -285,7 +285,9 @@ pdf2zh example.pdf -t 1
 
 - [ ] Support multiple language with [Noto Font](https://fonts.google.com/noto), [Ubuntu Font](https://design.ubuntu.com/font)
 
-- [ ] Retry except KeyboardInterrupt
+- [ ] Async retry except KeyboardInterrupt
+
+- [ ] Knuth–Plass algorithm for western languages
 
 <h2 id="acknowledgement">Acknowledgements</h2>
 

+ 8 - 7
pdf2zh/translator.py

@@ -5,7 +5,8 @@ import logging
 import os
 import re
 import time
-from datetime import UTC, datetime
+from datetime import timezone, datetime
+
 from json import dumps, loads
 import unicodedata
 
@@ -111,7 +112,7 @@ class TencentTranslator(BaseTranslator):
         )
 
         timestamp = int(time.time())
-        date = datetime.fromtimestamp(timestamp, UTC).strftime("%Y-%m-%d")
+        date = datetime.fromtimestamp(timestamp, timezone.utc).strftime("%Y-%m-%d")
         credential_scope = date + "/tmt/tc3_request"
         hashed_canonical_request = hashlib.sha256(
             canonical_request.encode("utf-8")
@@ -170,13 +171,13 @@ class TencentTranslator(BaseTranslator):
         # 2. Result test
         try:
             result = result["Response"]["TargetText"]
-            return result
+            # return result
         except KeyError:
             result = ""
-            raise ValueError("No valid key in Tencent's response")
-        # 3. Result length check
-        if len(result) == 0:
-            raise ValueError("Empty translation result")
+        #     raise ValueError("No valid key in Tencent's response")
+        # # 3. Result length check
+        # if len(result) == 0:
+        #     raise ValueError("Empty translation result")
         return result