Update README.md
Browse files
README.md
CHANGED
@@ -130,6 +130,15 @@ asr(
|
|
130 |
# 'text': ' Først så kan vi ta og henge dem kjemme, og så får vi gjøre vårt valget når vi kommer dit.'}]}
|
131 |
```
|
132 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
## Environmental Impact
|
134 |
|
135 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
@@ -145,7 +154,7 @@ Carbon emissions estimated using the [Machine Learning Impact calculator](https:
|
|
145 |
|
146 |
#### Software
|
147 |
|
148 |
-
The model is trained using Jax/Flax. The final model is converted to Pytorch, whisper.cpp and ONXX. Please tell us if you would like future models to be converted to other format.
|
149 |
|
150 |
## Citation & Authors
|
151 |
This model was developed within the scope of the _NoSTram_ project, led by _Per Egil Kummervold_. The Jax code and training scripts were crafted by _Javier de la Rosa_, _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Dataset curation was carried out by _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Documentation was composed by _Javier de la Rosa_ and _Per Egil Kummervold_. The AiLab is under the direction of _Svein Arne Brygfjeld_. Each author contributed to the development and deliberations on the optimal way to train a Norwegian ASR model using Whisper. The work on this model was conducted as part of our professional roles at the National Library of Norway.
|
|
|
130 |
# 'text': ' Først så kan vi ta og henge dem kjemme, og så får vi gjøre vårt valget når vi kommer dit.'}]}
|
131 |
```
|
132 |
|
133 |
+
## Training Data
|
134 |
+
Trained data comes from Språkbanken and the digital collection at the National Library of Norway. Training data includes:
|
135 |
+
|
136 |
+
- NST Norwegian ASR Database (16 kHz), and its corresponding dataset
|
137 |
+
- Transcribed speeches from the Norwegian Parliament produced by Språkbanken
|
138 |
+
- TV broadcast (NRK) subtitles (NLN digital collection)
|
139 |
+
- Audiobooks (NLN digital collection)
|
140 |
+
|
141 |
+
|
142 |
## Environmental Impact
|
143 |
|
144 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
|
|
154 |
|
155 |
#### Software
|
156 |
|
157 |
+
The model is trained using Jax/Flax. The final model is converted to Pytorch, Tensorflow, whisper.cpp and ONXX. Please tell us if you would like future models to be converted to other format.
|
158 |
|
159 |
## Citation & Authors
|
160 |
This model was developed within the scope of the _NoSTram_ project, led by _Per Egil Kummervold_. The Jax code and training scripts were crafted by _Javier de la Rosa_, _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Dataset curation was carried out by _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Documentation was composed by _Javier de la Rosa_ and _Per Egil Kummervold_. The AiLab is under the direction of _Svein Arne Brygfjeld_. Each author contributed to the development and deliberations on the optimal way to train a Norwegian ASR model using Whisper. The work on this model was conducted as part of our professional roles at the National Library of Norway.
|