Нема описа

Byaidu cd364b455f Update README.md пре 1 година
pdf2zh f458a0c742 fix lang_space пре 1 година
.gitignore f7d3e72bea Initial commit пре 1 година
LICENSE 04e1dedd8e Update LICENSE пре 1 година
README.md cd364b455f Update README.md пре 1 година
setup.py cef7512295 add ollama пре 1 година

README.md

PDFMathTranslate

PDF scientific paper translation and bilingual comparison based on font rules and deep learning, preserving formula and figure layout.

  • Retain formulas and charts.

  • Preserve table of contents.

  • Support multiple translation services.

image

image

Installation

pip install pdf2zh

Usage

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory.

Translate the entire document

pdf2zh example.pdf

Translate part of the document

pdf2zh example.pdf -p 1-3,5

Translate with the specified language

See Languages Codes.

pdf2zh example.pdf -li en -lo ja

Translate with Ollama

pdf2zh example.pdf -s gemma2

Use regex to specify formula fonts and characters that need to be preserved

pdf2zh BDA3.pdf -f "(CM[^RT].*|MS.*|XY.*|MT.*|BL.*|.*0700|.*0500|.*Italic)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Acknowledgement

Document merging: PyMuPDF

Document parsing: Pdfminer.six

Document extraction: MinerU

Multi-threaded translation: MathTranslate

Layout parsing: DocLayout-YOLO

Star History

Star History Chart