|
|
2 tahun lalu | |
|---|---|---|
| .. | ||
| README.md | 2 tahun lalu | |
| README_zh.md | 2 tahun lalu | |
| infer.py | 2 tahun lalu | |
| infer.sh | 2 tahun lalu | |
| utils | 2 tahun lalu | |
(简体中文|English)
Note: The modelscope pipeline supports all the models in model zoo to inference and finetune. Here we take the model of the punctuation model of CT-Transformer as example to demonstrate the usage.
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.punctuation,
model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
model_revision=None)
rec_result = inference_pipeline(text_in='example/punc_example.txt')
print(rec_result)
text二进制数据,例如:用户直接从文件里读出bytes数据
rec_result = inference_pipeline(text_in='我们都是木头人不会讲话不会动')
text文件url,例如:https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt
rec_result = inference_pipeline(text_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt')
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.punctuation,
model='damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727',
model_revision=None,
)
inputs = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益"
vads = inputs.split("|")
rec_result_all="outputs:"
param_dict = {"cache": []}
for vad in vads:
rec_result = inference_pipeline(text_in=vad, param_dict=param_dict)
rec_result_all += rec_result['text']
print(rec_result_all)
Full code of demo, please ref to demo
task: Tasks.punctuationmodel: model name in model zoo, or model path in local diskngpu: 1 (Default), decoding on GPU. If ngpu=0, decoding on CPUoutput_dir: None (Default), the output path of results if setmodel_revision: None (Default), setting the model versiontext_in: the input to decode, which could be:
e.g.: "我们都是木头人不会讲话不会动"e.g.: example/punc_example.txt
In this case of text file input, output_dir must be set to save the output resultsparam_dict: reserving the cache which is necessary in realtime mode.FunASR also offer recipes egs_modelscope/punctuation/TEMPLATE/infer.sh to decode with multi-thread CPUs, or multi GPUs. It is an offline recipe and only support offline model.
infer.shmodel: model name in model zoo, or model path in local diskdata_dir: the dataset dir needs to include punc.txtoutput_dir: output dir of the recognition resultsgpu_inference: true (Default), whether to perform gpu decoding, set false for CPU inferencegpuid_list: 0,1 (Default), which gpu_ids are used to infernjob: only used for CPU inference (gpu_inference=false), 64 (Default), the number of jobs for CPU decodingcheckpoint_dir: only used for infer finetuned models, the path dir of finetuned modelscheckpoint_name: only used for infer finetuned models, punc.pb (Default), which checkpoint is used to infer bash infer.sh \
--model "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--batch_size 1 \
--gpu_inference true \
--gpuid_list "0,1"
bash infer.sh \
--model "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
--njob 1