游雁 2 ani în urmă
părinte
comite
c9905b9be0

+ 1 - 0
README.md

@@ -79,6 +79,7 @@ FunASR has open-sourced a large number of pre-trained models on industrial data.
 |                                   fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) )                                   |              voice activity detection              | 5000 hours, Mandarin and English |    0.4M    | 
 |                                     fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) )                                     |                timestamp prediction                |       5000 hours, Mandarin       |    38M     | 
 |                                       cam++ <br> ( [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) )                                        |        speaker verification/diarization            |            5000 hours            |    7.2M    | 
+|                                                 whisper-large-v2 <br> ([⭐](https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary)  [🤗]() )                                                   | speech recognition, with timestamps, non-streaming |          multilingual            |     1G     |
 
 
 

+ 11 - 10
README_zh.md

@@ -71,16 +71,17 @@ FunASR开源了大量在工业数据上预训练模型,您可以在[模型许
 (注:⭐ 表示ModelScope模型仓库链接,🤗 表示Huggingface模型仓库链接)
 
 
-|                                         模型名字                                                                                                                 |        任务详情        |     训练数据     | 参数量  |
-|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
-| paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗](https://huggingface.co/funasr/paraformer-tp) ) |  语音识别,带时间戳输出,非实时   |  60000小时,中文  | 220M |
-|   paraformer-zh-streaming <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗](https://huggingface.co/funasr/paraformer-zh-streaming) )   |      语音识别,实时       |  60000小时,中文  | 220M |
-|      paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗](https://huggingface.co/funasr/paraformer-en) )      |      语音识别,非实时      |  50000小时,英文  | 220M |
-|                  conformer-en <br> ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗](https://huggingface.co/funasr/conformer-en) )                   |      语音识别,非实时      |  50000小时,英文  | 220M |
-|                  ct-punc <br> ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) )                   |        标点恢复        |  100M,中文与英文  | 1.1G | 
-|                       fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) )                       |     语音端点检测,实时      | 5000小时,中文与英文 | 0.4M | 
-|                       fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) )                        |      字级别时间戳预测      |  50000小时,中文  | 38M  |
-|                           cam++ <br> ( [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) )                            |      说话人确认/分割      |   5000小时     |    7.2M    | 
+|                                         模型名字                                                                                                                 |      任务详情       |     训练数据     | 参数量  |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------:|:------------:|:----:|
+| paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗](https://huggingface.co/funasr/paraformer-tp) ) | 语音识别,带时间戳输出,非实时 |  60000小时,中文  | 220M |
+|   paraformer-zh-streaming <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗](https://huggingface.co/funasr/paraformer-zh-streaming) )   |     语音识别,实时     |  60000小时,中文  | 220M |
+|      paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗](https://huggingface.co/funasr/paraformer-en) )      |    语音识别,非实时     |  50000小时,英文  | 220M |
+|                  conformer-en <br> ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗](https://huggingface.co/funasr/conformer-en) )                   |    语音识别,非实时     |  50000小时,英文  | 220M |
+|                  ct-punc <br> ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) )                   |      标点恢复       |  100M,中文与英文  | 1.1G | 
+|                       fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) )                       |    语音端点检测,实时    | 5000小时,中文与英文 | 0.4M | 
+|                       fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) )                        |    字级别时间戳预测     |  50000小时,中文  | 38M  |
+|                           cam++ <br> ( [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) )                            |    说话人确认/分割     |    5000小时    | 7.2M | 
+| whisper-large-v2 <br> ([⭐](https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary)  [🤗]() ) | 语音识别,带时间戳输出,非实时 |     多语言      |  1G  |
 
 
 <a name="快速开始"></a>

+ 1 - 1
examples/industrial_data_pretraining/whisper/demo.py

@@ -9,5 +9,5 @@ model = AutoModel(model="iic/speech_whisper-large_asr_multilingual",
                   model_revision="v2.0.4",
                   )
 
-res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
+res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", language=None)
 print(res)

+ 1 - 0
funasr/download/name_maps_from_hub.py

@@ -8,6 +8,7 @@ name_maps_ms = {
     "ct-punc-c": "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
     "fa-zh": "damo/speech_timestamp_prediction-v1-16k-offline",
     "cam++": "damo/speech_campplus_sv_zh-cn_16k-common",
+    "whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
 }
 
 name_maps_hf = {