carlosdanielhernandezmena commited on
Commit
e2e3aa3
1 Parent(s): 407197d

Adding info to the README file.

Browse files
Files changed (1) hide show
  1. README.md +197 -0
README.md CHANGED
@@ -1,3 +1,200 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: is
3
+ datasets:
4
+ - language-and-voice-lab/samromur_asr
5
+ - language-and-voice-lab/samromur_children
6
+ - language-and-voice-lab/malromur_asr
7
+ - language-and-voice-lab/althingi_asr
8
+ tags:
9
+ - audio
10
+ - automatic-speech-recognition
11
+ - icelandic
12
+ - whisper
13
+ - whisper-large
14
+ - iceland
15
+ - reykjavik
16
+ - samromur
17
  license: cc-by-4.0
18
+ widget:
19
+ model-index:
20
+ - name: whisper-large-icelandic-10k-steps-1000h
21
+ results:
22
+ - task:
23
+ name: Automatic Speech Recognition
24
+ type: automatic-speech-recognition
25
+ dataset:
26
+ name: Samrómur (Test)
27
+ type: language-and-voice-lab/samromur_asr
28
+ split: test
29
+ args:
30
+ language: is
31
+ metrics:
32
+ - name: WER
33
+ type: wer
34
+ value: 11.879
35
+ - task:
36
+ name: Automatic Speech Recognition
37
+ type: automatic-speech-recognition
38
+ dataset:
39
+ name: Samrómur (Dev)
40
+ type: language-and-voice-lab/samromur_asr
41
+ split: validation
42
+ args:
43
+ language: is
44
+ metrics:
45
+ - name: WER
46
+ type: wer
47
+ value: 10.849
48
+ - task:
49
+ name: Automatic Speech Recognition
50
+ type: automatic-speech-recognition
51
+ dataset:
52
+ name: Samrómur Children (Test)
53
+ type: language-and-voice-lab/samromur_children
54
+ split: test
55
+ args:
56
+ language: is
57
+ metrics:
58
+ - name: WER
59
+ type: wer
60
+ value: 12.325
61
+ - task:
62
+ name: Automatic Speech Recognition
63
+ type: automatic-speech-recognition
64
+ dataset:
65
+ name: Samrómur Children (Dev)
66
+ type: language-and-voice-lab/samromur_children
67
+ split: validation
68
+ args:
69
+ language: is
70
+ metrics:
71
+ - name: WER
72
+ type: wer
73
+ value: 8.078
74
+ - task:
75
+ name: Automatic Speech Recognition
76
+ type: automatic-speech-recognition
77
+ dataset:
78
+ name: Malrómur (Test)
79
+ type: language-and-voice-lab/malromur_asr
80
+ split: test
81
+ args:
82
+ language: is
83
+ metrics:
84
+ - name: WER
85
+ type: wer
86
+ value: 10.132
87
+ - task:
88
+ name: Automatic Speech Recognition
89
+ type: automatic-speech-recognition
90
+ dataset:
91
+ name: Malrómur (Dev)
92
+ type: language-and-voice-lab/malromur_asr
93
+ split: validation
94
+ args:
95
+ language: is
96
+ metrics:
97
+ - name: WER
98
+ type: wer
99
+ value: 10.157
100
+ - task:
101
+ name: Automatic Speech Recognition
102
+ type: automatic-speech-recognition
103
+ dataset:
104
+ name: Althingi (Test)
105
+ type: language-and-voice-lab/althingi_asr
106
+ split: test
107
+ args:
108
+ language: is
109
+ metrics:
110
+ - name: WER
111
+ type: wer
112
+ value: 11.750
113
+ - task:
114
+ name: Automatic Speech Recognition
115
+ type: automatic-speech-recognition
116
+ dataset:
117
+ name: Althingi (Dev)
118
+ type: language-and-voice-lab/althingi_asr
119
+ split: validation
120
+ args:
121
+ language: is
122
+ metrics:
123
+ - name: WER
124
+ type: wer
125
+ value: 11.141
126
  ---
127
+ # whisper-large-icelandic-10k-steps-1000h
128
+
129
+ The "whisper-large-icelandic-10k-steps-1000h" is an acoustic model suitable for Automatic Speech Recognition in Icelandic. It is the result of fine-tuning the model "openai/whisper-large" with around 1000 hours of Icelandic data developed by the [Language and Voice Laboratory](https://huggingface.co/language-and-voice-lab). Most of the data is available at public repositories such as [LDC](https://www.ldc.upenn.edu/), [OpenSLR](https://openslr.org/) or [Clarin.is](https://clarin.is/)
130
+
131
+ The specific list of corpora used to fine-tune the model is:
132
+
133
+ - [Samrómur 21.05 (114h34m)](http://www.openslr.org/112/)
134
+ - [Samrómur Children (127h25m)](https://catalog.ldc.upenn.edu/LDC2022S11)
135
+ - [Malrómur (119hh03m)](https://clarin.is/en/resources/malromur/)
136
+ - [Althingi Parliamentary Speech (514h29m)](https://catalog.ldc.upenn.edu/LDC2021S01)
137
+ - L2-Speakers Data (125h55m) **Unpublished material**
138
+
139
+ The fine-tuning process was performed during March (2023) in the servers of the Language and Voice Laboratory (https://lvl.ru.is/) at Reykjavík University (Iceland) by Carlos Daniel Hernández Mena.
140
+
141
+ # Evaluation
142
+ ```python
143
+ import torch
144
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
145
+
146
+ #Load the processor and model.
147
+ MODEL_NAME="carlosdanielhernandezmena/whisper-large-icelandic-10k-steps-1000h"
148
+ processor = WhisperProcessor.from_pretrained(MODEL_NAME)
149
+ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
150
+
151
+ #Load the dataset
152
+ from datasets import load_dataset, load_metric, Audio
153
+ ds=load_dataset("language-and-voice-lab/samromur_children",split='test')
154
+
155
+ #Downsample to 16kHz
156
+ ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
157
+
158
+ #Process the dataset
159
+ def map_to_pred(batch):
160
+ audio = batch["audio"]
161
+ input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
162
+ batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
163
+
164
+ with torch.no_grad():
165
+ predicted_ids = model.generate(input_features.to("cuda"))[0]
166
+
167
+ transcription = processor.decode(predicted_ids)
168
+ batch["prediction"] = processor.tokenizer._normalize(transcription)
169
+
170
+ return batch
171
+
172
+ #Do the evaluation
173
+ result = ds.map(map_to_pred)
174
+
175
+ #Compute the overall WER now.
176
+ from evaluate import load
177
+
178
+ wer = load("wer")
179
+ WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
180
+ print(WER)
181
+ ```
182
+ **Test Result**: 12.325364793542379
183
+
184
+ # BibTeX entry and citation info
185
+ *When publishing results based on these models please refer to:*
186
+ ```bibtex
187
+ @misc{mena2023whisperlarge10kicelandic,
188
+ title={Acoustic Model in Icelandic: whisper-large-icelandic-10k-steps-1000h.},
189
+ author={Hernandez Mena, Carlos Daniel},
190
+ year={2023},
191
+ url={https://huggingface.co/carlosdanielhernandezmena/whisper-large-icelandic-10k-steps-1000h},
192
+ }
193
+ ```
194
+
195
+ # Acknowledgements
196
+
197
+ Thanks to Jón Guðnason, head of the Language and Voice Lab for providing computational power to make this model possible. We also want to thank to the "Language Technology Programme for Icelandic 2019-2023" which is managed and coordinated by Almannarómur, and it is funded by the Icelandic Ministry of Education, Science and Culture.
198
+
199
+ Special thanks to Björn Ingi Stefánsson for setting up the configuration of the server where this model was trained.
200
+