whisper-lg-el-intlv-xs-2
This model is a fine-tuned version of farsipal/whisper-lg-el-intlv-xs on the mozilla-foundation/common_voice_11_0,google/fleurs el,el_gr dataset. It achieves the following results on the evaluation set:
- Loss: 0.2872
- Wer: 9.5004
Model description
The model was trained on two interleaved datasets for transcription in the Greek language.
Intended uses & limitations
Transcription in the Greek language
Training and evaluation data
Training was performed on two interleaved datasets. Testing was performed on common voice 11.0 (el) test only.
Training procedure
--model_name_or_path 'farsipal/whisper-lg-el-intlv-xs' \
--model_revision main \
--do_train True \
--do_eval True \
--use_auth_token False \
--freeze_feature_encoder False \
--freeze_encoder False \
--model_index_name 'whisper-lg-el-intlv-xs-2' \
--dataset_name 'mozilla-foundation/common_voice_11_0,google/fleurs' \
--dataset_config_name 'el,el_gr' \
--train_split_name 'train+validation,train+validation' \
--eval_split_name 'test,-' \
--text_column_name 'sentence,transcription' \
--audio_column_name 'audio,audio' \
--streaming False \
--max_duration_in_seconds 30 \
--do_lower_case False \
--do_remove_punctuation False \
--do_normalize_eval True \
--language greek \
--task transcribe \
--shuffle_buffer_size 500 \
--output_dir './data/finetuningRuns/whisper-lg-el-intlv-xs-2' \
--overwrite_output_dir True \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \
--learning_rate 3.5e-6 \
--dropout 0.15 \
--attention_dropout 0.05 \
--warmup_steps 500 \
--max_steps 5000 \
--eval_steps 1000 \
--gradient_checkpointing True \
--cache_dir '~/.cache' \
--fp16 True \
--evaluation_strategy steps \
--per_device_eval_batch_size 8 \
--predict_with_generate True \
--generation_max_length 225 \
--save_steps 1000 \
--logging_steps 25 \
--report_to tensorboard \
--load_best_model_at_end True \
--metric_for_best_model wer \
--greater_is_better False \
--push_to_hub False \
--dataloader_num_workers 6
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3.5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.0813 | 2.49 | 1000 | 0.2147 | 10.8284 |
0.0379 | 4.98 | 2000 | 0.2439 | 10.0111 |
0.0195 | 7.46 | 3000 | 0.2767 | 9.8811 |
0.0126 | 9.95 | 4000 | 0.2872 | 9.5004 |
0.0103 | 12.44 | 5000 | 0.3021 | 9.6954 |
Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.8.1.dev0
- Tokenizers 0.13.2
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Datasets used to train farsipal/whisper-lg-el-intlv-xs-2
Evaluation results
- Wer on mozilla-foundation/common_voice_11_0 eltest set self-reported9.500