Documentation > Advanced Usage (current)
Entire document
pdf2zh example.pdf
Part of the document
pdf2zh example.pdf -p 1-3,5
See Google Languages Codes, DeepL Languages Codes
pdf2zh example.pdf -li en -lo ja
We've provided a detailed table on the required environment variables for each translation service. Make sure to set them before using the respective service.
| Translator | Service | Environment Variables | Default Values | Notes |
|---|---|---|---|---|
| Google (Default) | google |
None | N/A | None |
| Bing | bing |
None | N/A | None |
| DeepL | deepl |
DEEPL_AUTH_KEY |
[Your Key] |
See DeepL |
| DeepLX | deeplx |
DEEPLX_ENDPOINT |
https://api.deepl.com/translate |
See DeepLX |
| Ollama | ollama |
OLLAMA_HOST, OLLAMA_MODEL |
http://127.0.0.1:11434, gemma2 |
See Ollama |
| OpenAI | openai |
OPENAI_BASE_URL, OPENAI_API_KEY, OPENAI_MODEL |
https://api.openai.com/v1, [Your Key], gpt-4o-mini |
See OpenAI |
| AzureOpenAI | azure-openai |
AZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_KEY, AZURE_OPENAI_MODEL |
[Your Endpoint], [Your Key], gpt-4o-mini |
See Azure OpenAI |
| Zhipu | zhipu |
ZHIPU_API_KEY, ZHIPU_MODEL |
[Your Key], glm-4-flash |
See Zhipu |
| ModelScope | ModelScope |
MODELSCOPE_API_KEY, MODELSCOPE_MODEL |
[Your Key], Qwen/Qwen2.5-Coder-32B-Instruct |
See ModelScope |
| Silicon | silicon |
SILICON_API_KEY, SILICON_MODEL |
[Your Key], Qwen/Qwen2.5-7B-Instruct |
See SiliconCloud |
| Gemini | gemini |
GEMINI_API_KEY, GEMINI_MODEL |
[Your Key], gemini-1.5-flash |
See Gemini |
| Azure | azure |
AZURE_ENDPOINT, AZURE_API_KEY |
https://api.translator.azure.cn, [Your Key] |
See Azure |
| Tencent | tencent |
TENCENTCLOUD_SECRET_ID, TENCENTCLOUD_SECRET_KEY |
[Your ID], [Your Key] |
See Tencent |
| Dify | dify |
DIFY_API_URL, DIFY_API_KEY |
[Your DIFY URL], [Your Key] |
See Dify,Three variables, lang_out, lang_in, and text, need to be defined in Dify's workflow input. |
| AnythingLLM | anythingllm |
AnythingLLM_URL, AnythingLLM_APIKEY |
[Your AnythingLLM URL], [Your Key] |
See anything-llm |
Use -s service or -s service:model to specify service:
pdf2zh example.pdf -s openai:gpt-4o-mini
Or specify model with environment variables:
set OPENAI_MODEL=gpt-4o-mini
pdf2zh example.pdf -s openai
Use regex to specify formula fonts and characters that need to be preserved:
pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
Preserve Latex, Mono, Code, Italic, Symbol and Math fonts by default:
pdf2zh example.pdf -f "(CM[^R]|(MS|XY|MT|BL|RM|EU|LA|RS)[A-Z]|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)"
Use -t to specify how many threads to use in translation:
pdf2zh example.pdf -t 1
Use --prompt to specify which prompt to use in llm:
pdf2zh example.pdf -pr prompt.txt
example prompt.txt
[
{
"role": "system",
"content": "You are a professional,authentic machine translation engine.",
},
{
"role": "user",
"content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:",
},
]
In custom prompt file, there are three variables can be used.
|variables|comment|
|-|-|
|lang_in|input language|
|lang_out|output language|
|text|text need to be translated|