|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- KBLab/rixvox |
|
language: |
|
- sv |
|
--- |
|
# Whisper Tiny RixVox Swedish |
|
|
|
This is a [Whisper tiny](https://huggingface.co/openai/whisper-tiny) finetuned for Swedish using |
|
the [RixVox](https://huggingface.co/datasets/KBLab/rixvox) dataset. |
|
|
|
Please note that this model, as every other encoder-decoder speech-to-text model, is prone to |
|
hallucinating on unexpected inputs and treats the task as translation rather than transcription. |
|
I.e your mileage may vary depending on filtering and type of data. |
|
|
|
In this release the entire encoder was frozen. Subsequent releases will not do this **if** the |
|
generalization to other types of data (i.e not parliamentary speeches) is kept when not freezing |
|
the encoder. |
|
|
|
## Evaluation |
|
|
|
* Fleurs WER: 51.68 |
|
* Fleurs WER (normalized*): 48.09 |
|
|
|
*) Normalization is done by applying the following to source and generated texts: |
|
|
|
``` |
|
def normalize(s): |
|
return ' '.join([ x for x in sub('[^0-9a-zåäöA-ZÅÄÖ ]', ' ', s.lower()).split() ]) |
|
``` |
|
|
|
## Training |
|
|
|
Training was done using Huggingface and Deepspeed with ZeRO stage 2. |
|
|
|
* learning rate: 1e-5 |
|
* optimizer: CPUAdamW (Deepspeed) |
|
* lr scheduler: linear |
|
* warmup steps: 500 |
|
* per device batch size: 32 |
|
* GPUs: 8 x NVIDIA A100 40GB |
|
* total batch size: 160 |
|
* steps: 10000 |
|
* lowercase: no |
|
* fp16 |
|
* entire encoder was frozen |