PDF scientific paper translation and bilingual comparison.
Feel free to provide feedback in GitHub Issues or Telegram Group.
Updates
- [Nov. 20 2024] Support Docker
- [Nov. 20 2024] Support multiple-threads
- [Nov. 19 2024] Provides an graphical user interface
- [Nov. 18 2024] Supports DeepL, DeepLX, and Azure
Installation
Require Python version >=3.8, <=3.12
pip install pdf2zh
Usage
Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory. Use Google as the default translation service.
Please refer to ChatGPT for how to set environment variables.
Full / partial document translation
- Entire document
```bash
pdf2zh example.pdf
```
- Part of the document
```bash
pdf2zh example.pdf -p 1-3,5
```
Specify source and target languages
See [Google Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages), [DeepL Languages Codes](https://developers.deepl.com/docs/resources/supported-languages)
```bash
pdf2zh example.pdf -li en -lo ja
```
Translate with Different Services
- **DeepL**
See [DeepL](https://support.deepl.com/hc/en-us/articles/360020695820-API-Key-for-DeepL-s-API)
Set ENVs to construct an endpoint like: `{DEEPL_SERVER_URL}/translate`
- `DEEPL_SERVER_URL` (Optional), e.g., `export DEEPL_SERVER_URL=https://api.deepl.com`
- `DEEPL_AUTH_KEY`, e.g., `export DEEPL_AUTH_KEY=xxx`
```bash
pdf2zh example.pdf -s deepl
```
- **DeepLX**
See [DeepLX](https://github.com/OwO-Network/DeepLX)
Set ENVs to construct an endpoint like: `{DEEPL_SERVER_URL}/translate`
- `DEEPLX_SERVER_URL` (Optional), e.g., `export DEEPLX_SERVER_URL=https://api.deeplx.org`
- `DEEPLX_AUTH_KEY`, e.g., `export DEEPLX_AUTH_KEY=xxx`
```bash
pdf2zh example.pdf -s deeplx
```
- **Ollama**
See [Ollama](https://github.com/ollama/ollama)
Set ENVs to construct an endpoint like: `{OLLAMA_HOST}/api/chat`
- `OLLAMA_HOST` (Optional), e.g., `export OLLAMA_HOST=https://localhost:11434`
```bash
pdf2zh example.pdf -s ollama:gemma2
```
- **LLM with OpenAI compatible schemas (OpenAI / SiliconCloud / Zhipu)**
See [SiliconCloud](https://docs.siliconflow.cn/quickstart), [Zhipu](https://open.bigmodel.cn/dev/api/thirdparty-frame/openai-sdk)
Set ENVs to construct an endpoint like: `{OPENAI_BASE_URL}/chat/completions`
- `OPENAI_BASE_URL` (Optional), e.g., `export OPENAI_BASE_URL=https://api.openai.com/v1`
- `OPENAI_API_KEY`, e.g., `export OPENAI_API_KEY=xxx`
```bash
pdf2zh example.pdf -s openai:gpt-4o
```
- **Azure**
See [Azure Text Translation](https://docs.azure.cn/en-us/ai-services/translator/text-translation-overview)
Following ENVs are required:
- `AZURE_APIKEY`, e.g., `export AZURE_APIKEY=xxx`
- `AZURE_ENDPOINT`, e.g, `export AZURE_ENDPOINT=https://api.translator.azure.cn/`
- `AZURE_REGION`, e.g., `export AZURE_REGION=chinaeast2`
```bash
pdf2zh example.pdf -s azure
```
Translate wih exceptions
Use regex to specify formula fonts and characters that need to be preserved.
```bash
pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
```
Interact with GUI

```bash
pdf2zh -i
```
See [documentation for GUI](./docs/README_GUI.md) for more details.
Docker
1. Pull and run:
```bash
docker pull byaidu/pdf2zh
docker run -p 7860:7860 byaidu/pdf2zh
```
2. Open in browser:
```
http://localhost:7860/
```
Preview



Acknowledgements
Contributors
Star History