|
|
2 лет назад | |
|---|---|---|
| .. | ||
| README.md | 2 лет назад | |
Note: The modelscope pipeline supports all the models in model zoo to inference and finetine. Here we take the model of xvector_sv as example to demonstrate the usage.
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
# initialize pipeline
inference_diar_pipline = pipeline(
mode="sond_demo",
num_workers=0,
task=Tasks.speaker_diarization,
diar_model_config="sond.yaml",
model='damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch',
reversion="v1.0.5",
sv_model="damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch",
sv_model_revision="v1.2.2",
)
# input: a list of audio in which the first item is a speech recording to detect speakers,
# and the following wav file are used to extract speaker embeddings.
audio_list = [
"https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/record.wav",
"https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk1.wav",
"https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk2.wav",
"https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk3.wav",
"https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk4.wav",
]
results = inference_diar_pipline(audio_in=audio_list)
print(results)
task: Tasks.speaker_diarizationmodel: model name in model zoo, or model path in local diskngpu: 1 (Default), decoding on GPU. If ngpu=0, decoding on CPUoutput_dir: None (Default), the output path of results if setbatch_size: 1 (Default), batch size when decodingsmooth_size: 83 (Default), the window size to perform smoothingdur_threshold: 10 (Default), segments shorter than 100 ms will be droppedout_format: vad (Default), the output format, choices ["vad", "rttm"].
audio_in: the input to process, which could be:
e.g.: waveform files at a websitee.g.: path/to/a.wav("wav.scp,speech,sound", "profile.scp,profile,kaldi_ark"): a script file of waveform files and another script file of speaker profiles (extracted with the model)
wav.scp
test1 path/to/enroll1.wav
test2 path/to/enroll2.wav
profile.scp
test1 path/to/profile.ark:11
test2 path/to/profile.ark:234
The profile.ark file contains speaker embeddings in a kaldi-like style. Please refer README.md for more details.
For single input, we recommend the "list of local file path" mode for inference. For multiple inputs, we recommend the last mode with pre-organized wav.scp and profile.scp.
We recommend the last mode with split wav.scp and profile.scp. Then, run inference for each split part. Please refer README.md to find a similar process.
Similar to CPU, please set ngpu=1 for inference on GPU.
Besides, you should use CUDA_VISIBLE_DEVICES=0 to specify a GPU device.
Please refer README.md to find a similar process.