README_ONLINE.md 2.8 KB

Steps:

  1. Prepare model repo files
  2. git clone https://www.modelscope.cn/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx.git
  3. Convert lfr_cmvn_pe.onnx model. For example: python export_lfr_cmvn_pe_onnx.py
  4. If you export to onnx, you should have several model files in ${MODEL_DIR}:

    ├── README.md
    └── model_repo_paraformer_large_online
    ├── cif_search
    │   ├── 1
    │   │   └── model.py
    │   └── config.pbtxt
    ├── decoder
    │   ├── 1
    │   │   └── decoder.onnx
    │   └── config.pbtxt
    ├── encoder
    │   ├── 1
    │   │   └── model.onnx
    │   └── config.pbtxt
    ├── feature_extractor
    │   ├── 1
    │   │   └── model.py
    │   ├── config.pbtxt
    │   └── config.yaml
    ├── lfr_cmvn_pe
    │   ├── 1
    │   │   └── lfr_cmvn_pe.onnx
    │   ├── am.mvn
    │   ├── config.pbtxt
    │   └── export_lfr_cmvn_pe_onnx.py
    └── streaming_paraformer
        ├── 1
        └── config.pbtxt
    
  5. Follow below instructions to launch triton server

    # using docker image Dockerfile/Dockerfile.server
    docker build . -f Dockerfile/Dockerfile.server -t triton-paraformer:23.01 
    docker run -it --rm --name "paraformer_triton_server" --gpus all -v <path_host/model_repo_paraformer_large_online>:/workspace/ --shm-size 1g --net host triton-paraformer:23.01 
    
    # launch the service 
    cd /workspace
    tritonserver --model-repository model_repo_paraformer_large_online \
             --pinned-memory-pool-byte-size=512000000 \
             --cuda-memory-pool-byte-size=0:1024000000
    
    

Performance benchmark with a single A10

  • FP32, onnx, paraformer larger online,Our chunksize is 10 * 960 / 16000 = 0.6 s, so we should care about the perf of latency less than 0.6s so that it can be a realtime application.
Concurrency Throughput Latency_p50 (ms) Latency_p90 (ms) Latency_p95 (ms) Latency_p99 (ms)
20 309.252 56.913 76.267 85.598 138.462
40 391.058 97.911 145.509 150.545 185.399
60 426.269 138.244 185.855 201.016 236.528
80 431.781 170.991 227.983 252.453 412.273
100 473.351 206.205 262.612 288.964 463.337