metadata

language:
  - da
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_13_0
metrics:
  - wer
model-index:
  - name: Whisper Small Da - WasuratS
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 13
          type: mozilla-foundation/common_voice_13_0
          config: da
          split: test
          args: da
        metrics:
          - name: Wer
            type: wer
            value: 23.39882224190943

Whisper Small Da - WasuratS

This model is a fine-tuned version of openai/whisper-small on the Common Voice 13 dataset. It achieves the following results on the evaluation set:

Loss: 0.6393
Wer Ortho: 29.0926
Wer: 23.3988

Model description

openai/whisper-small

Training and evaluation data

mozilla-foundation/common_voice_13_0

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
training_steps: 4000
mixed_precision_training: Native AMP

from transformers import Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
    output_dir="./whisper-small-da", 
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,  
    learning_rate=1e-5,
    lr_scheduler_type="linear",
    warmup_steps=50,
    max_steps=4000,  
    gradient_checkpointing=True,
    fp16=True,
    fp16_full_eval=True,
    evaluation_strategy="steps",
    per_device_eval_batch_size=16,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=500,
    eval_steps=500,
    logging_steps=25,
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    greater_is_better=False,
    push_to_hub=True,
)

Training results

Training Loss	Epoch	Step	Validation Loss	Wer Ortho	Wer
0.218	1.61	500	0.4724	30.2496	24.7069
0.0628	3.22	1000	0.4825	28.8946	23.3154
0.0289	4.82	1500	0.5311	29.3376	23.4666
0.0078	6.43	2000	0.5740	29.4627	23.6542
0.0032	8.04	2500	0.6070	29.0613	23.2790
0.0025	9.65	3000	0.6274	29.1187	23.4770
0.0012	11.25	3500	0.6335	29.0978	23.3623
0.0011	12.86	4000	0.6393	29.0926	23.3988

Framework versions

Transformers 4.29.2
Pytorch 1.13.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3