Nenhuma descrição

Rongxin 1280288599 doc (README): add tutorials for using DeepLX 1 ano atrás
pdf2zh a4227ba1b4 feat (translator): allow custom API endpoint 1 ano atrás
.gitignore a0d87c73aa feat (translator, convertor): add support for DeepLX 1 ano atrás
LICENSE 04e1dedd8e Update LICENSE 1 ano atrás
README.md 1280288599 doc (README): add tutorials for using DeepLX 1 ano atrás
setup.py cef7512295 add ollama 1 ano atrás

README.md

PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

  • 📊 Retain formulas and charts.

  • 📄 Preserve table of contents.

  • 🌐 Support multiple translation services.

Installation

Require Python version >=3.8, <=3.11

pip install -U "pdf2zh>=1.5.3"

Usage

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory.

Translate the entire document

pdf2zh example.pdf

Translate part of the document

pdf2zh example.pdf -p 1-3,5

Translate with the specified language

See Languages Codes.

pdf2zh example.pdf -li en -lo ja

Translate with Ollama

See Ollama.

pdf2zh example.pdf -s gemma2

Translate with DeepLX

See DeepLX.

  1. Set ENVs to construct an endpoint like {DEEPLX_URL}/{DEEPLX_TOKEN}/translate:

    • DEEPLX_URL, e.g., export DEEPLX_URL=https://api.deeplx.org
    • DEEPLX_TOKEN, e.g., export DEEPLX_TOKEN=ABCDEFG
  2. Run:

    pdf2zh example.pdf -s deeplx
    

Use regex to specify formula fonts and characters that need to be preserved

pdf2zh BDA3.pdf -f "(CM[^RT].*|MS.*|XY.*|MT.*|BL.*|.*0700|.*0500|.*Italic)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Preview

image

image

image

Acknowledgement

Document merging: PyMuPDF

Document parsing: Pdfminer.six

Document extraction: MinerU

Multi-threaded translation: MathTranslate

Layout parsing: DocLayout-YOLO

Star History

Star History Chart