whisper-sm-el-xs / README.md

Upload 22 files

52ed255 almost 2 years ago

5.31 kB

	---
	language:
	- el
	license: apache-2.0
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	metrics:
	- wer
	model-index:
	- name: whisper-sm-el-xs
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: mozilla-foundation/common_voice_11_0 el
	type: mozilla-foundation/common_voice_11_0
	config: el
	split: test
	args: el
	metrics:
	- name: Wer
	type: wer
	value: 20.63521545319465
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper-Small (el) for Transcription

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the mozilla-foundation/common_voice_11_0 el dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4805
	- Wer: 20.6352

	## Model description

	This model is trained for transcription on the Greek subset on mozilla-foundation/common_voice_11_0 interleaved splits train+eval

	## Intended uses & limitations

	This is part of the Whisper Finetuning Event (December 2022)

	## Training and evaluation data

	Training used interleaved splits: train + evaluation.
	Evaluation was done on the test split.
	Data was streamed from Hugging Face's Hub.

	## Training procedure

	The script used has been uploaded in the files of this space
	The command to run it was:
	```
	python ./run_speech_recognition_seq2seq_streaming.py \
	--model_name_or_path "openai/whisper-small" \
	--model_revision "main" \
	--do_train True \
	--do_eval True \
	--use_auth_token False \
	--freeze_encoder False \
	--model_index_name "whisper-sm-el-xs" \
	--dataset_name "mozilla-foundation/common_voice_11_0" \
	--dataset_config_name "el" \
	--audio_column_name "audio" \
	--text_column_name "sentence" \
	--max_duration_in_seconds 30 \
	--train_split_name "train+validation" \
	--eval_split_name "test" \
	--do_lower_case False \
	--do_remove_punctuation False \
	--do_normalize_eval True \
	--language "greek" \
	--task "transcribe" \
	--shuffle_buffer_size 500 \
	--output_dir "./data/finetuningRuns/whisper-sm-el-xs" \
	--per_device_train_batch_size 16 \
	--gradient_accumulation_steps 4 \
	--learning_rate 1e-5 \
	--warmup_steps 500 \
	--max_steps 5000 \
	--gradient_checkpointing True \
	--fp16 True \
	--evaluation_strategy "steps" \
	--per_device_eval_batch_size 8 \
	--predict_with_generate True \
	--generation_max_length 225 \
	--save_steps 1000 \
	--eval_steps 1000 \
	--logging_steps 25 \
	--report_to "tensorboard" \
	--load_best_model_at_end True \
	--metric_for_best_model "wer" \
	--greater_is_better False \
	--push_to_hub False \
	--overwrite_output_dir True
	```
	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 5000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|
	\| 0.0024 \| 18.01 \| 1000 \| 0.4246 \| 21.0438 \|
	\| 0.0003 \| 37.01 \| 2000 \| 0.4805 \| 20.6352 \|
	\| 0.0001 \| 56.01 \| 3000 \| 0.5102 \| 20.8395 \|
	\| 0.0001 \| 75.0 \| 4000 \| 0.5296 \| 21.0717 \|
	\| 0.0001 \| 94.0 \| 5000 \| 0.5375 \| 21.0253 \|

	Here is the summary from the log of the run:

	```
	*** train metrics ***
	epoch = 94.0
	train_loss = 0.0222
	train_runtime = 23:06:13.19
	train_samples_per_second = 3.847
	train_steps_per_second = 0.06
	12/08/2022 11:20:17 - INFO - __main__ - * Evaluate *

	*** eval metrics ***
	epoch = 94.0
	eval_loss = 0.4805
	eval_runtime = 0:23:03.68
	eval_samples_per_second = 1.226
	eval_steps_per_second = 0.153
	eval_wer = 20.6352
	Thu 08 Dec 2022 11:43:22 AM EST
	```

	### Framework versions

	- Transformers 4.26.0.dev0
	- Pytorch 1.13.0
	- Datasets 2.7.1.dev0
	- Tokenizers 0.12.1