Librarian Bot: Add base_model information to model

1f1fd1f 11 months ago

4.43 kB

	---
	language:
	- pt
	license: apache-2.0
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	metrics:
	- wer
	- cer
	base_model: openai/whisper-large-v2
	model-index:
	- name: Whisper Large Portuguese
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: mozilla-foundation/common_voice_11_0 pt
	type: mozilla-foundation/common_voice_11_0
	config: pt
	split: test
	args: pt
	metrics:
	- type: wer
	value: 4.816664144852979
	name: WER
	- type: cer
	value: 1.6052355927195898
	name: CER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: google/fleurs pt_br
	type: google/fleurs
	config: pt_br
	split: test
	args: pt_br
	metrics:
	- type: wer
	value: 8.56762285333714
	name: WER
	- type: cer
	value: 5.462965196208485
	name: CER
	---

	# Whisper Large Portuguese

	This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on Portuguese using the train and validation splits of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning.


	## Usage

	```python

	from transformers import pipeline

	transcriber = pipeline(
	"automatic-speech-recognition",
	model="jonatasgrosman/whisper-large-pt-cv11"
	)

	transcriber.model.config.forced_decoder_ids = (
	transcriber.tokenizer.get_decoder_prompt_ids(
	language="pt",
	task="transcribe"
	)
	)

	transcription = transcriber("path/to/my_audio.wav")

	```

	## Evaluation

	I've performed the evaluation of the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (same dataset used for the fine-tuning) and the [Fleurs](https://huggingface.co/datasets/google/fleurs) (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I've performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I've evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs.

	### Common Voice 11

	\| \| CER \| WER \|
	\| --- \| --- \| --- \|
	\| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) \| 2.52 \| 9.56 \|
	\| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization \| 1.60 \| 4.82 \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) \| 4.32 \| 13.92 \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization \| 2.84 \| 7.02 \|

	### Fleurs

	\| \| CER \| WER \|
	\| --- \| --- \| --- \|
	\| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) \| 4.88 \| 12.08 \|
	\| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization \| 5.46 \| 8.57 \|
	\| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + keep only non-numeric samples \| 2.35 \| 9.00 \|
	\| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization + keep only non-numeric samples \| 3.36 \| 6.05 \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) \| 3.52 \| 10.55 \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization \| 4.19 \| 7.04 \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + keep only non-numeric samples \| 2.61 \| 9.29 \|
	\| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + keep only non-numeric samples \| 3.56 \| 6.15 \|