funasr
/

fsmn-vad-onnx

Voice Activity Detection

Model card Files Files and versions Community

langgz commited on May 9, 2023

Commit

db69bd7

•

1 Parent(s): eeb66f4

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 Voice activity detection (VAD) plays a important role in speech recognition systems by detecting the beginning and end of effective speech. FunASR provides an efficient VAD model based on the [FSMN structure](https://arxiv.org/abs/1803.05030). To improve model discrimination, we use monophones as modeling units, given the relatively rich speech information. During inference, the VAD system requires post-processing for improved robustness, including operations such as threshold settings and sliding windows.
-This repository demonstrates how to leverage FSMN-VAD in conjunction with the funasr_onnx runtime. The underlying model is derived from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which was trained on a massive 60,000-hour Mandarin dataset. Notably, Paraformer's performance secured the top spot on the [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard), highlighting its exceptional capabilities in speech recognition.
 We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the [documentation](https://alibaba-damo-academy.github.io/FunASR/en/index.html) available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by [FunASR](https://github.com/alibaba-damo-academy/FunASR).

 Voice activity detection (VAD) plays a important role in speech recognition systems by detecting the beginning and end of effective speech. FunASR provides an efficient VAD model based on the [FSMN structure](https://arxiv.org/abs/1803.05030). To improve model discrimination, we use monophones as modeling units, given the relatively rich speech information. During inference, the VAD system requires post-processing for improved robustness, including operations such as threshold settings and sliding windows.
+This repository demonstrates how to leverage FSMN-VAD in conjunction with the funasr_onnx runtime. The underlying model is derived from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which was trained on a massive 5,000-hour dataset.
 We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the [documentation](https://alibaba-damo-academy.github.io/FunASR/en/index.html) available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by [FunASR](https://github.com/alibaba-damo-academy/FunASR).