alvanlii
/

distil-whisper-small-cantonese

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

alvanlii commited on Apr 4

Commit

4ec307d

•

1 Parent(s): a22264f

Update README.md

Files changed (1) hide show

README.md +13 -9

README.md CHANGED Viewed

@@ -23,14 +23,17 @@ model-index:
     metrics:
     - name: Normalized CER
       type: cer
-      value: 9.77
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # Distil-Whisper Small zh-HK - Alvin
-This model is a distilled version of [alvanlii/whisper-small-cantonese](https://huggingface.co/alvanlii/whisper-small-cantonese) on the Cantonese language. It achieves a 9.77 CER (without punctuations), 11.7 CER (with punctuations) on Common Voice 16.0. It has 6 decoder layers instead of 12.
 ## Training and evaluation data
 For training,
@@ -40,14 +43,15 @@ For training,
 For evaluation, Common Voice 16.0 yue Test set is used.
-## Results
-- CER (lower is better): 0.117 (compared to 0.107 for `alvanlii/whisper-small-cantonese`)
-- GPU Inference with Fast Attention (sdpa): 0.039s/sample (down from 0.055s)
-  - Note all GPU evaluations are done on RTX 3090 GPU
-- GPU Inference: 0.041s/sample (down from 0.308s)
-- CPU Inference: 1.7s/sample (down from 2.57s)
-- GPU VRAM: ~2 GB
 ## Using the Model
 ```

     metrics:
     - name: Normalized CER
       type: cer
+      value: 9.7
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # Distil-Whisper Small zh-HK - Alvin
+- This model is a distilled version of [alvanlii/whisper-small-cantonese](https://huggingface.co/alvanlii/whisper-small-cantonese) on the Cantonese language.
+- Achieves a 9.7 CER (without punctuations), 11.59 CER (with punctuations) on Common Voice 16.0.
+- Has 3 decoder layers instead of regular 12 of the Whisper small model.
+- Uses ~2GB of GPU VRAM
 ## Training and evaluation data
 For training,
 For evaluation, Common Voice 16.0 yue Test set is used.
+## Comparisons to Whisper Small
+||`alvanlii/distil-whisper-small-cantonese`|`alvanlii/whisper-small-cantonese`|
+|--|--|--|
+|CER (lower is better)|0.116|0.107|
+|GPU Inference time (sdpa) [s/sample]|0.039|0.055|
+|GPU Inference (regular) [s/sample]|0.041|0.308|
+|CPU Inference [s/sample]|1.7|2.57|
+- inference time is calculated by taking the average inference time for the CV16 yue test set
 ## Using the Model
 ```