English | [įŽäŊ䏿](README_zh-CN.md)
# PDFMathTranslate
PDF scientific paper translation and bilingual comparison.
- đ Retain formulas and charts.
- đ Preserve table of contents.
- đ Support multiple translation services.
## Installation
Require Python version >=3.8, <=3.11
```bash
pip install -U "pdf2zh>=1.5.3"
```
## Usage
Execute the translation command in the command line to generate the translated document `example-zh.pdf` and the bilingual document `example-dual.pdf` in the current directory.
### Translate the entire document
```bash
pdf2zh example.pdf
```
### Translate part of the document
```bash
pdf2zh example.pdf -p 1-3,5
```
### Translate with the specified language
See [Google Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages), [DeepL Languages Codes](https://developers.deepl.com/docs/resources/supported-languages).
```bash
pdf2zh example.pdf -li en -lo ja
```
### Translate with DeepL/DeepLX
See [DeepLX](https://github.com/OwO-Network/DeepLX).
Set ENVs to construct an endpoint like: `{DEEPL_SERVER_URL}/{DEEPL_AUTH_KEY}/translate`
- `DEEPL_SERVER_URL` (Optional), e.g., `export DEEPL_SERVER_URL=https://api.deepl.com`
- `DEEPL_AUTH_KEY`, e.g., `export DEEPL_AUTH_KEY=xxx`
```bash
pdf2zh example.pdf -s deepl
```
### Translate with Ollama
See [Ollama](https://github.com/ollama/ollama).
Set ENVs to construct an endpoint like: `{OLLAMA_HOST}/api/chat`
- `OLLAMA_HOST` (Optional), e.g., `export OLLAMA_HOST=https://localhost:11434`
```bash
pdf2zh example.pdf -s ollama:gemma2
```
### Translate with OpenAI/SiliconCloud
See [OpenAI](https://platform.openai.com/docs/overview).
Set ENVs to construct an endpoint like: `{OPENAI_BASE_URL}/chat/completions`
- `OPENAI_BASE_URL` (Optional), e.g., `export OPENAI_BASE_URL=https://api.openai.com/v1`
- `OPENAI_API_KEY`, e.g., `export OPENAI_API_KEY=xxx`
```bash
pdf2zh example.pdf -s openai:gpt-4o
```
### Use regex to specify formula fonts and characters that need to be preserved
```bash
pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
```
## Preview



## Acknowledgement
Document merging: [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
Document parsing: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)
Document extraction: [MinerU](https://github.com/opendatalab/MinerU)
Multi-threaded translation: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate)
Layout parsing: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
Document standard: [PDF Explained](https://zxyle.github.io/PDF-Explained/), [PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/)
## Contributors