poonehmousavi commited on
Commit
76e01f0
1 Parent(s): ea72a53

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -132
README.md DELETED
@@ -1,132 +0,0 @@
1
- ---
2
- language:
3
- - mn
4
- thumbnail: null
5
- pipeline_tag: automatic-speech-recognition
6
- tags:
7
- - whisper
8
- - pytorch
9
- - speechbrain
10
- - Transformer
11
- - hf-asr-leaderboard
12
- license: apache-2.0
13
- datasets:
14
- - commonvoice
15
- metrics:
16
- - wer
17
- - cer
18
- model-index:
19
- - name: asr-whisper-large-v2-commonvoice-mn
20
- results:
21
- - task:
22
- name: Automatic Speech Recognition
23
- type: automatic-speech-recognition
24
- dataset:
25
- name: CommonVoice 10.0 (Mongolian)
26
- type: mozilla-foundation/common_voice_10_0
27
- config: mn
28
- split: test
29
- args:
30
- language: mn
31
- metrics:
32
- - name: Test WER
33
- type: wer
34
- value: '64.92'
35
- ---
36
-
37
- <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
38
- <br/><br/>
39
-
40
- # whisper large-v2 fine-tuned on CommonVoice Mongolian
41
-
42
- This repository provides all the necessary tools to perform automatic speech
43
- recognition from an end-to-end whisper model fine-tuned on CommonVoice (Mongolian Language) within
44
- SpeechBrain. For a better experience, we encourage you to learn more about
45
- [SpeechBrain](https://speechbrain.github.io).
46
-
47
- The performance of the model is the following:
48
-
49
- | Release | Test CER | Test WER | GPUs |
50
- |:-------------:|:--------------:|:--------------:| :--------:|
51
- | 01-02-23 | 25.73 | 64.92 | 1xV100 16GB |
52
-
53
- ## Pipeline description
54
-
55
- This ASR system is composed of whisper encoder-decoder blocks:
56
- - The pretrained whisper-large-v2 encoder is frozen.
57
- - The pretrained Whisper tokenizer is used.
58
- - A pretrained Whisper-large-v2 decoder ([openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)) is finetuned on CommonVoice MN.
59
- The obtained final acoustic representation is given to the greedy decoder.
60
-
61
- The system is trained with recordings sampled at 16kHz (single channel).
62
- The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
63
-
64
- ## Install SpeechBrain
65
-
66
- First of all, please install tranformers and SpeechBrain with the following command:
67
-
68
- ```
69
- pip install speechbrain transformers
70
- ```
71
-
72
- Please notice that we encourage you to read our tutorials and learn more about
73
- [SpeechBrain](https://speechbrain.github.io).
74
-
75
- ### Transcribing your own audio files (in Mongolian)
76
-
77
- ```python
78
-
79
- from speechbrain.pretrained.interfaces import foreign_class
80
-
81
- asr_model = foreign_class(source="speechbrain/asr-whisper-large-v2-commonvoice-mn", pymodule_file="custom_interface.py", classname="WhisperASR", hparams_file='hparams.yaml', savedir="pretrained_models/asr-whisper-large-v2-commonvoice-mn")
82
- asr_model.transcribe_file('speechbrain/asr-whisper-large-v2-commonvoice-mn/example-mn.mp3')
83
-
84
-
85
- ```
86
- ### Inference on GPU
87
- To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
88
-
89
- ### Training
90
- The model was trained with SpeechBrain.
91
- To train it from scratch follow these steps:
92
- 1. Clone SpeechBrain:
93
- ```bash
94
- git clone https://github.com/speechbrain/speechbrain/
95
- ```
96
- 2. Install it:
97
- ```bash
98
- cd speechbrain
99
- pip install -r requirements.txt
100
- pip install -e .
101
- ```
102
-
103
- 3. Run Training:
104
- ```bash
105
- cd recipes/CommonVoice/ASR/transformer/
106
- python train_with_whisper.py hparams/train_mn_hf_whisper.yaml --data_folder=your_data_folder
107
- ```
108
-
109
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/10E2xclgNx_6BFxNmv9i1HorBNnsMveP_?usp=share_link).
110
-
111
- ### Limitations
112
- The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
113
-
114
- #### Referencing SpeechBrain
115
-
116
- ```
117
- @misc{SB2021,
118
- author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
119
- title = {SpeechBrain},
120
- year = {2021},
121
- publisher = {GitHub},
122
- journal = {GitHub repository},
123
- howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
124
- }
125
- ```
126
-
127
- #### About SpeechBrain
128
- SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
129
-
130
- Website: https://speechbrain.github.io/
131
-
132
- GitHub: https://github.com/speechbrain/speechbrain