Oblivion208's picture
Update README.md
3a521bf
|
raw
history blame
4.11 kB
metadata
license: apache-2.0
datasets:
  - mozilla-foundation/common_voice_11_0
language:
  - yue
metrics:
  - cer
library_name: transformers
pipeline_tag: automatic-speech-recognition

🤗 HF Repo •🐱 Github Repo

Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperTokenizer, WhisperForConditionalGeneration

# Setups
model_name_or_path = "Oblivion208/whisper-tiny-cantonese"
task = "transcribe"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path).to(device)
tokenizer = WhisperTokenizer.from_pretrained(model_name_or_path, task=task)
processor = WhisperProcessor.from_pretrained(model_name_or_path, task=task)
feature_extractor = processor.feature_extractor
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

filepath = 'test.wav'
audio, sr = librosa.load(filepath, sr=16000, mono=True)
inputs = processor(audio, sample_rate=sr, return_tensors="pt").to(device)

with torch.inference_mode():
    generated_tokens = model.generate(
        input_features=inputs.input_features,
        return_dict_in_generate=True,
        max_new_tokens=255,
    )
    transcription = tokenizer.batch_decode(
        generated_tokens.sequences, skip_special_tokens=True)
    print(transcription)

Approximate Performance Evaluation

The following models are all trained and evaluated on a single RTX 3090 GPU.

Cantonese Test Results Comparison

MDCC

Model name Parameters Finetune Steps Time Spend Training Loss Validation Loss CER % Finetuned Model
whisper-tiny-cantonese 39 M 3200 4h 34m 0.0485 0.771 11.10 Link
whisper-base-cantonese 74 M 7200 13h 32m 0.0186 0.477 7.66 Link
whisper-small-cantonese 244 M 3600 6h 38m 0.0266 0.137 6.16 Link
whisper-small-lora-cantonese 3.5 M 8000 21h 27m 0.0687 0.382 7.40 Link
whisper-large-v2-lora-cantonese 15 M 10000 33h 40m 0.0046 0.277 3.77 Link

Common Voice Corpus 11.0

Model name Original CER % w/o Finetune CER % Jointly Finetune CER %
whisper-tiny-cantonese 124.03 66.85 35.87
whisper-base-cantonese 78.24 61.42 16.73
whisper-small-cantonese 52.83 31.23 /
whisper-small-lora-cantonese 37.53 19.38 14.73
whisper-large-v2-lora-cantonese 37.53 19.38 9.63