Automatic Speech Recognition
Transformers
Safetensors
French
whisper
asr
Eval Results
Inference Endpoints
trip-fontaine commited on
Commit
3da752d
1 Parent(s): 5265a3b

readme update

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -79,7 +79,7 @@ Distil-Whisper for English Automatic Speech Recognition (ASR) was proposed in th
79
 
80
  This is the knowledge distilled version of OpenAI's [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) for French ASR.
81
 
82
- The result is a distilled model that performs within **2% WER of whisper-large-v3** on out-of-distribution evaluation sets for both short-form and long form transcription. Moreover, it is **5.9x** faster than whisper-large-v3 and **1.3** times faster than the tiniest version of whisper while being uncomparably more accurate.
83
 
84
  | Model | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
85
  | :--------------------- | :--------: | :----------: | :------------: | :-----------: |
@@ -563,7 +563,7 @@ for further information.
563
 
564
  Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector inputs to a sequence of hidden-state vectors. The decoder auto-regressively predicts text tokens, conditional on all previous tokens and the encoder hidden-states. Consequently, the encoder is only run forward once, whereas the decoder is run as many times as the number of tokens generated. In practice, this means the decoder accounts for over 90% of total inference time. Thus, to optimise for latency, the focus is on minimising the inference time of the decoder.
565
 
566
- To distill the Whisper model, we reduce the number of decoder layers while keeping the encoder fixed. The encoder (shown in green) is entirely copied from the teacher to the student and frozen during training. The student's decoder structure is copied from whisper-large-v3, with the only difference being a reduction from 32 to 2 decoder layers. These layers are initialized from distil-large-v3 to leverage language transfer from English to French (more details [here](https://github.com/huggingface/distil-whisper/tree/main/training#22-language-transfer)).
567
 
568
  ### Training
569
 
 
79
 
80
  This is the knowledge distilled version of OpenAI's [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) for French ASR.
81
 
82
+ The result is a distilled model that performs within **2% WER of [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3)** on out-of-distribution evaluation sets for both short-form and long form transcription. Moreover, it is **5.9x** faster than [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) and **1.3** times faster than the tiniest version of whisper while being uncomparably more accurate.
83
 
84
  | Model | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
85
  | :--------------------- | :--------: | :----------: | :------------: | :-----------: |
 
563
 
564
  Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector inputs to a sequence of hidden-state vectors. The decoder auto-regressively predicts text tokens, conditional on all previous tokens and the encoder hidden-states. Consequently, the encoder is only run forward once, whereas the decoder is run as many times as the number of tokens generated. In practice, this means the decoder accounts for over 90% of total inference time. Thus, to optimise for latency, the focus is on minimising the inference time of the decoder.
565
 
566
+ To distill the Whisper model, we reduce the number of decoder layers while keeping the encoder fixed. The encoder (shown in green) is entirely copied from the teacher to the student and frozen during training. The student's decoder structure is copied from [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3), with the only difference being a reduction from 32 to 2 decoder layers. These layers are initialized from distil-large-v3 to leverage language transfer from English to French (more details [here](https://github.com/huggingface/distil-whisper/tree/main/training#22-language-transfer)).
567
 
568
  ### Training
569