Oblivion208
/

whisper-tiny-cantonese

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-tiny-cantonese / README.md

Oblivion208's picture

Update README.md

3a521bf about 1 year ago

|

history blame contribute delete

4.11 kB

	---
	license: apache-2.0
	datasets:
	- mozilla-foundation/common_voice_11_0
	language:
	- yue
	metrics:
	- cer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---

	<p align="left">
	🤗 <a href="https://huggingface.co/Oblivion208" target="_blank">HF Repo</a> •🐱 <a href="https://github.com/fengredrum/finetune-whisper-lora" target="_blank">Github Repo</a>
	</p>

	## Usage
	```python
	import torch
	import librosa
	from transformers import WhisperProcessor, WhisperTokenizer, WhisperForConditionalGeneration

	# Setups
	model_name_or_path = "Oblivion208/whisper-tiny-cantonese"
	task = "transcribe"
	device = "cuda:0" if torch.cuda.is_available() else "cpu"

	model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path).to(device)
	tokenizer = WhisperTokenizer.from_pretrained(model_name_or_path, task=task)
	processor = WhisperProcessor.from_pretrained(model_name_or_path, task=task)
	feature_extractor = processor.feature_extractor
	model.config.forced_decoder_ids = None
	model.config.suppress_tokens = []

	filepath = 'test.wav'
	audio, sr = librosa.load(filepath, sr=16000, mono=True)
	inputs = processor(audio, sample_rate=sr, return_tensors="pt").to(device)

	with torch.inference_mode():
	generated_tokens = model.generate(
	input_features=inputs.input_features,
	return_dict_in_generate=True,
	max_new_tokens=255,
	)
	transcription = tokenizer.batch_decode(
	generated_tokens.sequences, skip_special_tokens=True)
	print(transcription)
	```

	## Approximate Performance Evaluation

	The following models are all trained and evaluated on a single RTX 3090 GPU.

	### Cantonese Test Results Comparison

	#### MDCC

	\| Model name \| Parameters \| Finetune Steps \| Time Spend \| Training Loss \| Validation Loss \| CER % \| Finetuned Model \|
	\| ------------------------------- \| ---------- \| -------------- \| ---------- \| ------------- \| --------------- \| ----- \| ------------------------------------------------------------------------------------------------------------------------ \|
	\| whisper-tiny-cantonese \| 39 M \| 3200 \| 4h 34m \| 0.0485 \| 0.771 \| 11.10 \| [Link](https://huggingface.co/Oblivion208/whisper-tiny-cantonese "Oblivion208/whisper-tiny-cantonese") \|
	\| whisper-base-cantonese \| 74 M \| 7200 \| 13h 32m \| 0.0186 \| 0.477 \| 7.66 \| [Link](https://huggingface.co/Oblivion208/whisper-base-cantonese "Oblivion208/whisper-base-cantonese") \|
	\| whisper-small-cantonese \| 244 M \| 3600 \| 6h 38m \| 0.0266 \| 0.137 \| 6.16 \| [Link](https://huggingface.co/Oblivion208/whisper-small-cantonese "Oblivion208/whisper-small-cantonese") \|
	\| whisper-small-lora-cantonese \| 3.5 M \| 8000 \| 21h 27m \| 0.0687 \| 0.382 \| 7.40 \| [Link](https://huggingface.co/Oblivion208/whisper-small-lora-cantonese "Oblivion208/whisper-small-lora-cantonese") \|
	\| whisper-large-v2-lora-cantonese \| 15 M \| 10000 \| 33h 40m \| 0.0046 \| 0.277 \| 3.77 \| [Link](https://huggingface.co/Oblivion208/whisper-large-v2-lora-cantonese "Oblivion208/whisper-large-v2-lora-cantonese") \|

	#### Common Voice Corpus 11.0

	\| Model name \| Original CER % \| w/o Finetune CER % \| Jointly Finetune CER % \|
	\| ------------------------------- \| -------------- \| ------------------ \| ---------------------- \|
	\| whisper-tiny-cantonese \| 124.03 \| 66.85 \| 35.87 \|
	\| whisper-base-cantonese \| 78.24 \| 61.42 \| 16.73 \|
	\| whisper-small-cantonese \| 52.83 \| 31.23 \| / \|
	\| whisper-small-lora-cantonese \| 37.53 \| 19.38 \| 14.73 \|
	\| whisper-large-v2-lora-cantonese \| 37.53 \| 19.38 \| 9.63 \|