yhliang 3590ebffd3 Dev lyh (#860)		2 ani în urmă
..
conf	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
exp	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
figure	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
local	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
README.md	3590ebffd3 Dev lyh (#860)	2 ani în urmă
path.sh	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
run_asr.sh	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
run_diar.sh	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă
run_enh.sh	08ee9e6aac Add modular SA-ASR recipe for M2MeT2.0 (#831)	2 ani în urmă

Get Started

This is an official modular SA-ASR system used in M2MeT 2.0 challenge. We developed this system based on various pre-trained models after the challenge and reach the SOTA(until 2023.8.9) performance on the AliMeeting Test_2023 set. You can also transcribe your own dataset by preparing it into the specific format shown in

Dependency

To run this receipe, you should install Kaldi and set the KALDI_ROOT in path.sh.

export KALDI_ROOT=/your_kaldi_path

We use the VBx to provide initial diarization result to SOND and dscore to compute the DER. You should clone them before running this receipe.

$ mkdir VBx && cd VBx
$ git init 
$ git remote add origin https://github.com/BUTSpeechFIT/VBx.git
$ git config core.sparsecheckout true
$ echo "VBx/*" >> .git/info/sparse-checkout
$ git pull origin master
$ mv VBx/* .
$ cd ..
$ git clone https://github.com/nryant/dscore.git

We use the pb_chime5 to perform GSS. So you should install the dependencies of this repo using the following command.

$ git clone https://github.com/fgnt/pb_chime5.git
$ cd pb_chime5
$ git submodule init  
$ git submodule update
$ pip install -e pb_bss/
$ pip install -e .

Infer on the AliMeeting Test_2023 set

We follow the workflow shown below.

First you should set the DATA_SOURCE in path.sh to the data path. Your data path should be organized as follow:

Test_2023_Ali_far_release
|—— audio_dir/
|	|—— R1014_M1710.wav
|	|—— R1014_M1750.wav
|	|—— ...
|—— textgrid_dir/
|	|—— R1014_M1710.textgrid
|	|—— R1014_M1750.textgrid
|	|—— ...
|—— wav.scp
|—— segments

Then you can do speaker diarization with following command.

$ bash run_diar.sh

After diarization, you can check the result at the last line of data/Test_2023_Ali_far_sond/dia_outputs/dia_result. You should get a DER about 1.51%.

When you get the similar diarization result with us, then you can do the WPE and GSS using the following command.

$ bash run_enh.sh 8

The number 8 should be replaced with the channel number of your dataset. Here we use the AliMeeting corpus which has 8 channels.

Finally, you can decode the processed audio with the pre-trained ASR model directly using the flollowing commands.

$ bash run_asr.sh --stage 0 --stop-stage 1
$ bash run_asr.sh --stage 3 --stop-stage 3

The ASR result is saved at ./speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/decode_Test_2023_Ali_far_wpegss/text_cpcer.

Infer on the AliMeeting Test_2023 set after finetune

You can finetune the pre-trained ASR model with the AliMeeting train set to obtain a further reduction on the cpCER. To infer on the AliMeeting Test 2023 set after finetuning, you can run this commands after the train set is processed with WPE and GSS mentioned above.

$ bash run_asr.sh --stage 2 --stop-stage 3

Infer with your own dataset

We also support infer with your own dataset. Your dataset should be organized as above. The wav.scp and segments file should format as:

# wav.scp
sessionA wav_path/wav_name_A.wav 
sessionB wav_path/wav_name_B.wav 
sessionC wav_path/wav_name_C.wav 
...

# segments
sessionA-start_time-end_time sessionA start_time end_time
sessionB-start_time-end_time sessionA start_time end_time
sessionC-start_time-end_time sessionA start_time end_time
...

Then you should set the DATA_SOURCE and DATA_NAME in path.sh. The rest of the process is the same as Infer on the AliMeeting Test_2023 set.

Result

	VBx DER(%)	SOND DER(%)	cp-CER(%)
before finetune	16.87	1.51	10.18
after finetune	16.87	1.51	8.84