alvanlii commited on
Commit
2fa210d
1 Parent(s): 2b55ae8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -19
README.md CHANGED
@@ -30,7 +30,7 @@ should probably proofread and complete it, then remove this comment. -->
30
 
31
  # Distil-Whisper Small zh-HK - Alvin
32
 
33
- This model is a distilled and fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Cantonese language. It achieves a 9.77 CER (without punctuations), 11.7 CER (with punctuations) on Common Voice 16.0
34
 
35
  ## Training and evaluation data
36
  For training,
@@ -41,11 +41,11 @@ For training,
41
  For evaluation, Common Voice 16.0 yue Test set is used.
42
 
43
  ## Results
44
- - CER (lower is better): 0.117
45
- - GPU Inference with Fast Attention (example below): 0.039s/sample
46
  - Note all GPU evaluations are done on RTX 3090 GPU
47
- - GPU Inference: <TODO>s/sample
48
- - CPU Inference: 2.57s/sample
49
  - GPU VRAM: ~2 GB
50
 
51
 
@@ -89,17 +89,3 @@ pipe = pipeline(
89
  pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")
90
  text = pipe(file)["text"]
91
  ```
92
-
93
- ## Model Speedup
94
- Just add attn_implementation="sdpa" for Flash Attention.
95
- ```
96
- model = AutoModelForSpeechSeq2Seq.from_pretrained(
97
- "alvanlii/distil-whisper-small-cantonese",
98
- torch_dtype=torch_dtype,
99
- low_cpu_mem_usage=True,
100
- use_safetensors=True,
101
- attn_implementation="sdpa",
102
- )
103
- ```
104
- Using Flash Attention reduced the amount of time taken per sample from <TODO>s to 0.039s.
105
-
 
30
 
31
  # Distil-Whisper Small zh-HK - Alvin
32
 
33
+ This model is a distilled version of [alvanlii/whisper-small-cantonese](https://huggingface.co/alvanlii/whisper-small-cantonese) on the Cantonese language. It achieves a 9.77 CER (without punctuations), 11.7 CER (with punctuations) on Common Voice 16.0. It has 6 decoder layers instead of 12.
34
 
35
  ## Training and evaluation data
36
  For training,
 
41
  For evaluation, Common Voice 16.0 yue Test set is used.
42
 
43
  ## Results
44
+ - CER (lower is better): 0.117 (compared to 0.107 for `alvanlii/whisper-small-cantonese`)
45
+ - GPU Inference with Fast Attention (sdpa): 0.039s/sample (down from 0.055s)
46
  - Note all GPU evaluations are done on RTX 3090 GPU
47
+ - GPU Inference: 0.041s/sample (down from 0.308s)
48
+ - CPU Inference: 1.7s/sample (down from 2.57s)
49
  - GPU VRAM: ~2 GB
50
 
51
 
 
89
  pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")
90
  text = pipe(file)["text"]
91
  ```