بدون توضیح

Byaidu cdfc73ac28 Merge branch 'main' of https://github.com/Byaidu/PDFMathTranslate 1 سال پیش
.github 5ac8526f08 Update issue templates 1 سال پیش
pdf2zh 642597c6e4 fix: xref get 1 سال پیش
.gitignore a0d87c73aa feat (translator, convertor): add support for DeepLX 1 سال پیش
LICENSE 04e1dedd8e Update LICENSE 1 سال پیش
README.md 8dc399fe12 doc: add pepy 1 سال پیش
README_zh-CN.md 8dc399fe12 doc: add pepy 1 سال پیش
setup.py 162ec34355 fix: dep 1 سال پیش

README.md

English | [简体中文](README_zh-CN.md) # PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

  • 📊 Retain formulas and charts.

  • 📄 Preserve table of contents.

  • 🌐 Support multiple translation services.

Feel free to provide feedback in issues or user group.

Installation

Require Python version >=3.8, <=3.12

pip install pdf2zh

Usage

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory. Use Google as the default translation service.

Translate the entire document

pdf2zh example.pdf

Translate part of the document

pdf2zh example.pdf -p 1-3,5

Translate with the specified language

See Google Languages Codes, DeepL Languages Codes.

pdf2zh example.pdf -li en -lo ja

Translate with DeepL/DeepLX

See DeepLX.

Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/{DEEPL_AUTH_KEY}/translate

  • DEEPL_SERVER_URL (Optional), e.g., export DEEPL_SERVER_URL=https://api.deepl.com
  • DEEPL_AUTH_KEY, e.g., export DEEPL_AUTH_KEY=xxx

    pdf2zh example.pdf -s deepl
    

Translate with Ollama

See Ollama.

Set ENVs to construct an endpoint like: {OLLAMA_HOST}/api/chat

  • OLLAMA_HOST (Optional), e.g., export OLLAMA_HOST=https://localhost:11434

    pdf2zh example.pdf -s ollama:gemma2
    

Translate with OpenAI/SiliconCloud

See OpenAI.

Set ENVs to construct an endpoint like: {OPENAI_BASE_URL}/chat/completions

  • OPENAI_BASE_URL (Optional), e.g., export OPENAI_BASE_URL=https://api.openai.com/v1
  • OPENAI_API_KEY, e.g., export OPENAI_API_KEY=xxx

    pdf2zh example.pdf -s openai:gpt-4o
    

Use regex to specify formula fonts and characters that need to be preserved

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Preview

image

image

image

Acknowledgement

Document merging: PyMuPDF

Document parsing: Pdfminer.six

Document extraction: MinerU

Multi-threaded translation: MathTranslate

Layout parsing: DocLayout-YOLO

Document standard: PDF Explained, PDF Cheat Sheets

Contributors

Star History

Star History Chart