English | [įŽäŊ䏿](README_zh-CN.md) # PDFMathTranslate
PDF scientific paper translation and bilingual comparison. - đ Retain formulas and charts. - đ Preserve table of contents. - đ Support multiple translation services. ## Installation Require Python version >=3.8, <=3.11 ```bash pip install -U "pdf2zh>=1.5.3" ``` ## Usage Execute the translation command in the command line to generate the translated document `example-zh.pdf` and the bilingual document `example-dual.pdf` in the current directory. ### Translate the entire document ```bash pdf2zh example.pdf ``` ### Translate part of the document ```bash pdf2zh example.pdf -p 1-3,5 ``` ### Translate with the specified language See [Google Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages), [DeepL Languages Codes](https://developers.deepl.com/docs/resources/supported-languages). ```bash pdf2zh example.pdf -li en -lo ja ``` ### Translate with DeepL/DeepLX See [DeepLX](https://github.com/OwO-Network/DeepLX). Set ENVs to construct an endpoint like: `{DEEPL_SERVER_URL}/{DEEPL_AUTH_KEY}/translate` - `DEEPL_SERVER_URL` (Optional), e.g., `export DEEPL_SERVER_URL=https://api.deepl.com` - `DEEPL_AUTH_KEY`, e.g., `export DEEPL_AUTH_KEY=xxx` ```bash pdf2zh example.pdf -s deepl ``` ### Translate with Ollama See [Ollama](https://github.com/ollama/ollama). Set ENVs to construct an endpoint like: `{OLLAMA_HOST}/api/chat` - `OLLAMA_HOST` (Optional), e.g., `export OLLAMA_HOST=https://localhost:11434` ```bash pdf2zh example.pdf -s ollama:gemma2 ``` ### Translate with OpenAI/SiliconCloud See [OpenAI](https://platform.openai.com/docs/overview). Set ENVs to construct an endpoint like: `{OPENAI_BASE_URL}/chat/completions` - `OPENAI_BASE_URL` (Optional), e.g., `export OPENAI_BASE_URL=https://api.openai.com/v1` - `OPENAI_API_KEY`, e.g., `export OPENAI_API_KEY=xxx` ```bash pdf2zh example.pdf -s openai:gpt-4o ``` ### Use regex to specify formula fonts and characters that need to be preserved ```bash pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])" ``` ## Preview    ## Acknowledgement Document merging: [PyMuPDF](https://github.com/pymupdf/PyMuPDF) Document parsing: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six) Document extraction: [MinerU](https://github.com/opendatalab/MinerU) Multi-threaded translation: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate) Layout parsing: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) Document standard: [PDF Explained](https://zxyle.github.io/PDF-Explained/), [PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/) ## Contributors