Тайлбар байхгүй

Rongxin 3fad3b31d1 feat (gui): add gui support using gradio 1 жил өмнө
.github 5ac8526f08 Update issue templates 1 жил өмнө
gui 3fad3b31d1 feat (gui): add gui support using gradio 1 жил өмнө
pdf2zh a5780f68af Merge pull request #58 from Hanaasagi/azure 1 жил өмнө
.gitignore 3fad3b31d1 feat (gui): add gui support using gradio 1 жил өмнө
LICENSE 04e1dedd8e Update LICENSE 1 жил өмнө
README.md 97b116b099 feat: support Azure translation 1 жил өмнө
README_zh-CN.md e068ee4231 doc: fix deepl 1 жил өмнө
setup.py 97b116b099 feat: support Azure translation 1 жил өмнө

README.md

English | [简体中文](README_zh-CN.md) # PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

  • 📊 Retain formulas and charts.

  • 📄 Preserve table of contents.

  • 🌐 Support multiple translation services.

Feel free to provide feedback in issues or user group.

Installation

Require Python version >=3.8, <=3.12

pip install pdf2zh

Usage

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory. Use Google as the default translation service.

Please refer to ChatGPT for how to set environment variables.

Translate the entire document

pdf2zh example.pdf

Translate part of the document

pdf2zh example.pdf -p 1-3,5

Translate with the specified language

See Google Languages Codes, DeepL Languages Codes

pdf2zh example.pdf -li en -lo ja

Translate with DeepL/DeepLX

See DeepLX

Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

  • DEEPL_SERVER_URL (Optional), e.g., export DEEPL_SERVER_URL=https://api.deepl.com
  • DEEPL_AUTH_KEY, e.g., export DEEPL_AUTH_KEY=xxx

    pdf2zh example.pdf -s deepl
    

Translate with Ollama

See Ollama

Set ENVs to construct an endpoint like: {OLLAMA_HOST}/api/chat

  • OLLAMA_HOST (Optional), e.g., export OLLAMA_HOST=https://localhost:11434

    pdf2zh example.pdf -s ollama:gemma2
    

Translate with OpenAI/SiliconCloud/Zhipu

See SiliconCloud, Zhipu

Set ENVs to construct an endpoint like: {OPENAI_BASE_URL}/chat/completions

  • OPENAI_BASE_URL (Optional), e.g., export OPENAI_BASE_URL=https://api.openai.com/v1
  • OPENAI_API_KEY, e.g., export OPENAI_API_KEY=xxx

    pdf2zh example.pdf -s openai:gpt-4o
    

Translate with Azure Text Translation

See What is Azure Text Translation?

Following ENVs are required.

  • AZURE_APIKEY, e.g., export AZURE_APIKEY=xxx
  • AZURE_ENDPOINT, e.g, export AZURE_ENDPOINT=https://api.translator.azure.cn/
  • AZURE_REGION, e.g., export AZURE_REGION=chinaeast2

    pdf2zh example.pdf -s azure
    

Use regex to specify formula fonts and characters that need to be preserved

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Preview

image

image

image

Acknowledgement

Document merging: PyMuPDF

Document parsing: Pdfminer.six

Document extraction: MinerU

Multi-threaded translation: MathTranslate

Layout parsing: DocLayout-YOLO

Document standard: PDF Explained, PDF Cheat Sheets

Contributors

Star History

Star History Chart