yonas commited on
Commit
0a005e5
1 Parent(s): c59e55e

Upload config

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - rw
4
+ license: cc-by-4.0
5
+ library_name: nemo
6
+ datasets:
7
+ - mozilla-foundation/common_voice_11_0
8
+ thumbnail: null
9
+ tags:
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - ASR
13
+ - Kinyarwanda
14
+ - Swahili
15
+ - Luganda
16
+ - Multilingual
17
+ - audio
18
+ - CTC
19
+ - Conformer
20
+ - Transformer
21
+ - NeMo
22
+ - pytorch
23
+ model-index:
24
+ - name: stt_rw_sw_lg_conformer_ctc_large
25
+ results: []
26
+
27
+ ---
28
+
29
+
30
+ ## Model Overview
31
+
32
+ <DESCRIBE IN ONE LINE THE MODEL AND ITS USE>
33
+
34
+ ## NVIDIA NeMo: Training
35
+
36
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
37
+ ```
38
+ pip install nemo_toolkit['all']
39
+ ```
40
+
41
+ ## How to Use this Model
42
+
43
+ The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
44
+
45
+ ### Automatically instantiate the model
46
+
47
+ ```python
48
+ import nemo.collections.asr as nemo_asr
49
+ asr_model = nemo_asr.models.ASRModel.from_pretrained("yonas/stt_rw_sw_lg_conformer_ctc_large")
50
+ ```
51
+
52
+ ### Transcribing using Python
53
+ First, let's get a sample
54
+ ```
55
+ wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
56
+ ```
57
+ Then simply do:
58
+ ```
59
+ asr_model.transcribe(['2086-149220-0033.wav'])
60
+ ```
61
+
62
+ ### Transcribing many audio files
63
+
64
+ ```shell
65
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="yonas/stt_rw_sw_lg_conformer_ctc_large" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
66
+ ```
67
+
68
+ ### Input
69
+
70
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
71
+
72
+ ### Output
73
+
74
+ This model provides transcribed speech as a string for a given audio sample.
75
+
76
+ ## Model Architecture
77
+
78
+ <ADD SOME INFORMATION ABOUT THE ARCHITECTURE>
79
+
80
+ ## Training
81
+
82
+ <ADD INFORMATION ABOUT HOW THE MODEL WAS TRAINED - HOW MANY EPOCHS, AMOUNT OF COMPUTE ETC>
83
+
84
+ ### Datasets
85
+
86
+ <LIST THE NAME AND SPLITS OF DATASETS USED TO TRAIN THIS MODEL (ALONG WITH LANGUAGE AND ANY ADDITIONAL INFORMATION)>
87
+
88
+ ## Performance
89
+
90
+ <LIST THE SCORES OF THE MODEL -
91
+ OR
92
+ USE THE Hugging Face Evaluate LiBRARY TO UPLOAD METRICS>
93
+
94
+ ## Limitations
95
+
96
+ <DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>
97
+
98
+ Eg:
99
+ Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
100
+
101
+
102
+ ## References
103
+
104
+ <ADD ANY REFERENCES HERE AS NEEDED>
105
+
106
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
107
+