Explorar o código

Merge branch 'main' of github.com:alibaba-damo-academy/FunASR
add

游雁 %!s(int64=2) %!d(string=hai) anos
pai
achega
000ec7e630

+ 6 - 0
README.md

@@ -87,6 +87,12 @@ The use of pretraining model is subject to [model licencs](./MODEL_LICENSE)
   year={2023},
   booktitle={INTERSPEECH},
 }
+@inproceedings{wang2023told,
+  author={Jiaming Wang and Zhihao Du and Shiliang Zhang},
+  title={{TOLD:} {A} Novel Two-Stage Overlap-Aware Framework for Speaker Diarization},
+  year={2023},
+  booktitle={ICASSP},
+}
 @inproceedings{gao22b_interspeech,
   author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
   title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},

+ 74 - 1
funasr/runtime/docs/SDK_advanced_guide_online.md

@@ -1,7 +1,80 @@
  # Advanced Development Guide (File transcription service)
  
 FunASR provides a Chinese online transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), offline large-scale speech recognition (ASR) using Paraformer-large, online large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. 
-This document serves as a development guide for the FunASR online transcription service. If you wish to quickly experience the online transcription service, please refer to the one-click deployment example for the FunASR online transcription service ([docs](./SDK_tutorial_online.md)).
+This document serves as a development guide for the FunASR online transcription service. If you wish to quickly experience the online transcription service, please refer to the one-click deployment example for the FunASR online transcription service [Quick Start](#Quick Start)。
+
+### 镜像启动
+
+通过下述命令拉取并启动FunASR软件包的docker镜像:
+
+```shell
+sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
+mkdir -p ./funasr-runtime-resources/models
+sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
+```
+如果您没有安装docker,可参考[Docker安装](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker_zh.html)
+
+### 服务端启动
+
+docker启动之后,启动 funasr-wss-server-2pass服务程序:
+```shell
+cd FunASR/funasr/runtime
+./run_server_2pass.sh \
+  --download-model-dir /workspace/models \
+  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
+  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
+  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
+  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
+```
+服务端详细参数介绍可参考[服务端参数介绍](#服务端参数介绍)
+### 客户端测试与使用
+
+下载客户端测试工具目录samples
+```shell
+wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
+```
+我们以Python语言客户端为例,进行说明,支持音频格式(.wav, .pcm),以及多文件列表wav.scp输入,其他版本客户端请参考文档([点击此处](#客户端用法详解)),定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
+```shell
+python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass
+```
+
+
+## Quick Start
+
+### Server Startup
+
+pull and run docker image:
+
+```shell
+sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
+mkdir -p ./funasr-runtime-resources/models
+sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
+```
+
+start funasr-wss-server-2pass:
+```shell
+cd FunASR/funasr/runtime
+./run_server_2pass.sh \
+  --download-model-dir /workspace/models \
+  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
+  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
+  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
+  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
+```
+
+
+
+### Client Testing and Usage
+
+After running the above installation instructions, the client testing tool directory samples will be downloaded in the default installation directory /root/funasr-runtime-resources ([download click](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)).
+We take the Python language client as an example to explain that it supports multiple audio format inputs (such as .wav, .pcm, .mp3, etc.), video inputs (.mp4, etc.), and multiple file list wav.scp inputs. For other client versions, please refer to the [documentation](#Detailed-Description-of-Client-Usage).
+
+```shell
+python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --audio_in "../audio/asr_example.wav"
+```
+
+
+
 
 ## Installation of Docker
 

+ 3 - 3
funasr/runtime/websocket/funasr-wss-server-2pass.cpp

@@ -154,7 +154,7 @@ int main(int argc, char* argv[]) {
       std::string python_cmd =
           "python -m funasr.utils.runtime_sdk_download_tool --type onnx --quantize True ";
 
-      if (vad_dir.isSet() && !s_vad_path.empty()) {
+      if (!s_vad_path.empty()) {
         std::string python_cmd_vad;
         std::string down_vad_path;
         std::string down_vad_model;
@@ -200,7 +200,7 @@ int main(int argc, char* argv[]) {
         LOG(INFO) << "VAD model is not set, use default.";
       }
 
-      if (offline_model_dir.isSet() && !s_offline_asr_path.empty()) {
+      if (!s_offline_asr_path.empty()) {
         std::string python_cmd_asr;
         std::string down_asr_path;
         std::string down_asr_model;
@@ -288,7 +288,7 @@ int main(int argc, char* argv[]) {
         LOG(INFO) << "ASR online model is not set, use default.";
       }
 
-      if (punc_dir.isSet() && !s_punc_path.empty()) {
+      if (!s_punc_path.empty()) {
         std::string python_cmd_punc;
         std::string down_punc_path;
         std::string down_punc_model;