Oblivion208's picture
Create README.md
af3067a
metadata
license: apache-2.0
datasets:
  - mozilla-foundation/common_voice_11_0
language:
  - yue
metrics:
  - cer
library_name: transformers
pipeline_tag: automatic-speech-recognition

🤗 HF Repo •🐱 Github Repo

Approximate Performance Evaluation

The following models are all trained and evaluated on a single RTX 3090 GPU.

Cantonese Test Results Comparison

MDCC

Model name Parameters Finetune Steps Time Spend Training Loss Validation Loss CER % Finetuned Model
whisper-tiny-cantonese 39 M 3200 4h 34m 0.0485 0.771 11.10 Link
whisper-base-cantonese 74 M 7200 13h 32m 0.0186 0.477 7.66 Link
whisper-small-cantonese 244 M 3600 6h 38m 0.0266 0.137 6.16 Link
whisper-small-lora-cantonese 3.5 M 8000 21h 27m 0.0687 0.382 7.40 Link
whisper-large-v2-lora-cantonese 15 M 10000 33h 40m 0.0046 0.277 3.77 Link

Common Voice Corpus 11.0

Model name Original CER % w/o Finetune CER % Jointly Finetune CER %
whisper-tiny-cantonese 124.03 66.85 35.87
whisper-base-cantonese 78.24 61.42 16.73
whisper-small-cantonese 52.83 31.23 /
whisper-small-lora-cantonese 37.53 19.38 14.73
whisper-large-v2-lora-cantonese 37.53 19.38 9.63