|
|
há 11 meses atrás | |
|---|---|---|
| .github | há 11 meses atrás | |
| docs | há 11 meses atrás | |
| pdf2zh | há 11 meses atrás | |
| script | há 11 meses atrás | |
| test | há 11 meses atrás | |
| .dockerignore | há 11 meses atrás | |
| .gitignore | há 11 meses atrás | |
| .pre-commit-config.yaml | há 1 ano atrás | |
| Dockerfile | há 11 meses atrás | |
| LICENSE | há 1 ano atrás | |
| README.md | há 11 meses atrás | |
| app.json | há 1 ano atrás | |
| pyproject.toml | há 11 meses atrás | |
| setup.cfg | há 1 ano atrás |
PDF scientific paper translation and bilingual comparison.
Feel free to provide feedback in GitHub Issues, Telegram Group or QQ Group.
-cp (by @reycn)
You can try our application out using either of the following demos:
Note that the computing resources of the demo are limited, so please avoid abusing them.
For different use cases, we provide four distinct methods to use our program:
6745ed36-9acc-800e-8a90-59204bd134):
```bash
pdf2zh document.pdf
```
See [documentation for GUI](./docs/README_GUI.md) for more details.
The present program needs an AI model(wybxc/DocLayout-YOLO-DocStructBench-onnx) before working and some users are not able to download due to network issues. If you have a problem with downloading this model, we provide a workaround using the following environment variable:
set HF_ENDPOINT=https://hf-mirror.com
If the solution does not work to you / you encountered other issues, please refer to frequently asked questions.
Execute the translation command in the command line to generate the translated document example-mono.pdf and the bilingual document example-dual.pdf in the current working directory. Use Google as the default translation service.

In the following table, we list all advanced options for reference:
| Option | Function | Example |
|---|---|---|
| files | Local files | pdf2zh ~/local.pdf |
| links | Online files | pdf2zh http://arxiv.org/paper.pdf |
-i |
Enter GUI | pdf2zh -i |
-p |
Partial document translation | pdf2zh example.pdf -p 1 |
-li |
Source language | pdf2zh example.pdf -li en |
-lo |
Target language | pdf2zh example.pdf -lo zh |
-s |
Translation service | pdf2zh example.pdf -s deepl |
-t |
Multi-threads | pdf2zh example.pdf -t 1 |
-o |
Output dir | pdf2zh example.pdf -o output |
-f, -c |
Exceptions | pdf2zh example.pdf -f "(MS.*)" |
-cp |
Compatibility Mode | pdf2zh example.pdf --compatible |
--share |
Public link | pdf2zh -i --share |
--authorized |
Authorization | pdf2zh -i --authorized users.txt [auth.html] |
--prompt |
Custom Prompt | pdf2zh --prompt [prompt.txt] |
For detailed explanations, please refer to our document about Advanced Usage for a full list of each option.
For downstream applications, please refer to our document about API Details for futher information about:
[ ] Parse layout with DocLayNet based models, PaddleX, PaperMage, SAM2
[ ] Fix page rotation, table of contents, format of lists
[ ] Fix pixel formula in old papers
[ ] Async retry except KeyboardInterrupt
[ ] Knuth–Plass algorithm for western languages
[ ] Support non-PDF/A files
Document merging: PyMuPDF
Document parsing: Pdfminer.six
Document extraction: MinerU
Document Preview: Gradio PDF
Multi-threaded translation: MathTranslate
Layout parsing: DocLayout-YOLO
Document standard: PDF Explained, PDF Cheat Sheets
Multilingual Font: Go Noto Universal