Librarian Bot: Add base_model information to model (#2)

d7048bf verified about 1 month ago

No virus

4.42 kB

	---
	language:
	- id
	license: apache-2.0
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	- magic_data
	- TITML
	metrics:
	- wer
	base_model: openai/whisper-medium
	model-index:
	- name: Whisper Medium Indonesian
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: mozilla-foundation/common_voice_11_0 id
	type: mozilla-foundation/common_voice_11_0
	config: id
	split: test
	metrics:
	- type: wer
	value: 3.8273540533062804
	name: Wer
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: google/fleurs id_id
	type: google/fleurs
	config: id_id
	split: test
	metrics:
	- type: wer
	value: 9.74
	name: Wer
	---

	# Whisper Medium Indonesian

	This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the
	Indonesian mozilla-foundation/common_voice_11_0, magic_data, titml and google/fleurs dataset. It achieves the following
	results:
	### CV11 test split:
	- Loss: 0.0698
	- Wer: 3.8274
	### Google/fleurs test split:
	- Wer: 9.74

	## Usage

	```python
	from transformers import pipeline
	transcriber = pipeline(
	"automatic-speech-recognition",
	model="cahya/whisper-medium-id"
	)
	transcriber.model.config.forced_decoder_ids = (
	transcriber.tokenizer.get_decoder_prompt_ids(
	language="id"
	task="transcribe"
	)
	)
	transcription = transcriber("my_audio_file.mp3")
	```

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 10000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|
	\| 0.0427 \| 0.33 \| 1000 \| 0.0664 \| 4.3807 \|
	\| 0.042 \| 0.66 \| 2000 \| 0.0658 \| 3.9426 \|
	\| 0.0265 \| 0.99 \| 3000 \| 0.0657 \| 3.8274 \|
	\| 0.0211 \| 1.32 \| 4000 \| 0.0679 \| 3.8366 \|
	\| 0.0212 \| 1.66 \| 5000 \| 0.0682 \| 3.8412 \|
	\| 0.0206 \| 1.99 \| 6000 \| 0.0683 \| 3.8689 \|
	\| 0.0166 \| 2.32 \| 7000 \| 0.0711 \| 3.9657 \|
	\| 0.0095 \| 2.65 \| 8000 \| 0.0717 \| 3.9980 \|
	\| 0.0122 \| 2.98 \| 9000 \| 0.0714 \| 3.9795 \|
	\| 0.0049 \| 3.31 \| 10000 \| 0.0720 \| 3.9887 \|

	## Evaluation

	We evaluated the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0)
	and the [Google Fleurs](https://huggingface.co/datasets/google/fleurs).
	As Whisper can transcribe casing and punctuation, we also evaluate its performance using raw and normalized text.
	(lowercase + removal of punctuations). The results are as follows:

	### Common Voice 11

	\| \| WER \|
	\|---------------------------------------------------------------------------\|------\|
	\| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) \| 3.83 \|
	\| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) \| 12.62 \|

	### Google/Fleurs

	\| \| WER \|
	\|-------------------------------------------------------------------------------------------------------------\|------\|
	\| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) \| 9.74 \|
	\| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) + text normalization \| tbc \|
	\| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) \| 10.2 \|
	\| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) + text normalization \| tbc \|
	\|
	### Framework versions

	- Transformers 4.26.0.dev0
	- Pytorch 1.13.0+cu117
	- Datasets 2.7.1.dev0
	- Tokenizers 0.13.2