speechbrain
/

asr-whisper-large-v2-commonvoice-mn

@@ -1,132 +0,0 @@
----
-language:
-- mn
-thumbnail: null
-pipeline_tag: automatic-speech-recognition
-tags:
-- whisper
-- pytorch
-- speechbrain
-- Transformer
-- hf-asr-leaderboard
-license: apache-2.0
-datasets:
-- commonvoice
-metrics:
-- wer
-- cer
-model-index:
-- name: asr-whisper-large-v2-commonvoice-mn
-  results:
-  - task:
-      name: Automatic Speech Recognition
-      type: automatic-speech-recognition
-    dataset:
-      name: CommonVoice 10.0 (Mongolian)
-      type: mozilla-foundation/common_voice_10_0
-      config: mn
-      split: test
-      args:
-        language: mn
-    metrics:
-    - name: Test WER
-      type: wer
-      value: '64.92'
----
-<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
-<br/><br/>
-# whisper large-v2 fine-tuned on CommonVoice Mongolian
-This repository provides all the necessary tools to perform automatic speech
-recognition from an end-to-end whisper model fine-tuned on CommonVoice (Mongolian Language) within
-SpeechBrain. For a better experience, we encourage you to learn more about
-[SpeechBrain](https://speechbrain.github.io).
-The performance of the model is the following:
-| Release | Test CER | Test WER | GPUs |
-|:-------------:|:--------------:|:--------------:| :--------:|
-| 01-02-23 | 25.73 | 64.92 | 1xV100 16GB |
-## Pipeline description
-This ASR system is composed of whisper encoder-decoder blocks:
-- The pretrained whisper-large-v2 encoder is frozen.
-- The pretrained Whisper tokenizer is used.
-- A pretrained Whisper-large-v2 decoder ([openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)) is finetuned on CommonVoice MN.
-The obtained final acoustic representation is given to the greedy decoder.
-The system is trained with recordings sampled at 16kHz (single channel).
-The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
-## Install SpeechBrain
-First of all, please install tranformers and SpeechBrain with the following command:
-```
-pip install speechbrain transformers
-```
-Please notice that we encourage you to read our tutorials and learn more about
-[SpeechBrain](https://speechbrain.github.io).
-### Transcribing your own audio files (in Mongolian)
-```python
-from speechbrain.pretrained.interfaces import foreign_class
-asr_model = foreign_class(source="speechbrain/asr-whisper-large-v2-commonvoice-mn", pymodule_file="custom_interface.py", classname="WhisperASR", hparams_file='hparams.yaml', savedir="pretrained_models/asr-whisper-large-v2-commonvoice-mn")
-asr_model.transcribe_file('speechbrain/asr-whisper-large-v2-commonvoice-mn/example-mn.mp3')
-```
-### Inference on GPU
-To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
-### Training
-The model was trained with SpeechBrain.
-To train it from scratch follow these steps:
-1. Clone SpeechBrain:
-```bash
-git clone https://github.com/speechbrain/speechbrain/
-```
-2. Install it:
-```bash
-cd speechbrain
-pip install -r requirements.txt
-pip install -e .
-```
-3. Run Training:
-```bash
-cd recipes/CommonVoice/ASR/transformer/
-python train_with_whisper.py hparams/train_mn_hf_whisper.yaml --data_folder=your_data_folder
-```
-You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/10E2xclgNx_6BFxNmv9i1HorBNnsMveP_?usp=share_link).
-### Limitations
-The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
-#### Referencing SpeechBrain
-```
-@misc{SB2021,
-    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
-    title = {SpeechBrain},
-    year = {2021},
-    publisher = {GitHub},
-    journal = {GitHub repository},
-    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
-  }
-```
-#### About SpeechBrain
-SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
-Website: https://speechbrain.github.io/
-GitHub: https://github.com/speechbrain/speechbrain