Automatic Speech Recognition
Transformers
Safetensors
French
whisper
asr
Eval Results
Inference Endpoints
trip-fontaine commited on
Commit
85e88e4
1 Parent(s): 561b47e

update readme

Browse files
Files changed (1) hide show
  1. README.md +22 -20
README.md CHANGED
@@ -83,11 +83,11 @@ The result is a distilled model that performs within **2% WER of [Whisper large-
83
 
84
  | Model | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
85
  | :--------------------- | :--------: | :----------: | :------------: | :-----------: |
86
- | whisper-tiny | 37.8 | 4.7 | 43.24 | 28.28 |
87
- | whisper-base | 72.6 | 3.7 | 30.48 | 19.23 |
88
- | whisper-small | 242 | 2.3 | 16.36 | 12.47 |
89
- | whisper-medium | 764 | 1.3 | 11.53 | 10.77 |
90
- | whisper-large-v3 | 1540 | 1.0 | 7.84 | 9.07 |
91
  | **distil-large-v3-fr** | **756** | **5.9** | **9.34** | **11.13** |
92
 
93
  *latencies benchmarked to generate 128 tokens on A100 40GB with a batch size of 1. More details about inference performances in [inference speed](#inference-speed) section.
@@ -618,14 +618,15 @@ The model has been tested for both in-distribution (Common Voice 17 and Multilin
618
 
619
  ### Short-Form
620
 
621
- | Model Name | RTFx | Common Voice 17 | Multilingual Librispeech | Voxpopuli | Fleurs |
622
- | :----------------: | :-----: | :-------------: | :----------------------: | :-------: | :----: |
623
- | whisper-tiny | 280.576 | 56.757 | 37.512 | 32.505 | 46.173 |
624
- | whisper-base | 261.235 | 42.447 | 25.2 | 26.434 | 27.851 |
625
- | whisper-small | 249.676 | 22.469 | 14.097 | 14.61 | 14.283 |
626
- | whisper-medium | 170.9 | 15.432 | 9.602 | 11.92 | 9.155 |
627
- | whisper-large-v3 | 150.719 | 11.024 | 4.783 | 9.948 | 5.624 |
628
- | distil-large-v3-fr | 310.127 | 12.681 | 5.865 | 10.851 | 7.984 |
 
629
 
630
  *the above datasets correspond to test splits
631
 
@@ -633,14 +634,15 @@ The model has been tested for both in-distribution (Common Voice 17 and Multilin
633
  ### Long-Form
634
 
635
 
636
- | Model Name | RTFx | [long-form test set](https://huggingface.co/datasets/eustlb/french-long-form-test) |
637
  | :----------------: | :-----: | :--------------------------------------------------------------------------------: |
638
- | whisper-tiny | 125.367 | 28.277 |
639
- | whisper-base | 110.139 | 19.228 |
640
- | whisper-small | 83.417 | 12.467 |
641
- | whisper-medium | 56.677 | 10.772 |
642
- | whisper-large-v3 | 41.805 | 9.073 |
643
- | distil-large-v3-fr | 169.692 | 11.385 |
 
644
 
645
 
646
 
 
83
 
84
  | Model | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
85
  | :--------------------- | :--------: | :----------: | :------------: | :-----------: |
86
+ | whisper-tiny | 37.8 | 4.7 | 43.73 | 28.158 |
87
+ | whisper-base | 72.6 | 3.7 | 30.57 | 18.665 |
88
+ | whisper-small | 242 | 2.3 | 16.20 | 12.557 |
89
+ | whisper-medium | 764 | 1.3 | 11.720 | 11.023 |
90
+ | whisper-large-v3 | 1540 | 1.0 | 7.81 | 9.008 |
91
  | **distil-large-v3-fr** | **756** | **5.9** | **9.34** | **11.13** |
92
 
93
  *latencies benchmarked to generate 128 tokens on A100 40GB with a batch size of 1. More details about inference performances in [inference speed](#inference-speed) section.
 
618
 
619
  ### Short-Form
620
 
621
+ | Model | Common Voice 17 | Multilingual Librispeech | voxpopuli | fleurs | RTFx |
622
+ | :--------------------- | :-------------: | :----------------------: | :--------: | :-------: | :---------: |
623
+ | whisper-tiny | 57.141 | 38.049 | 32.346 | 47.4 | 265.226 |
624
+ | whisper-base | 42.58 | 25.235 | 26.701 | 27.773 | 237.195 |
625
+ | whisper-small | 22.56 | 13.576 | 14.486 | 14.165 | 196.932 |
626
+ | whisper-medium | 15.51 | 9.541 | 11.836 | 9.992 | 93.428 |
627
+ | whisper-large-v3 | 11.038 | 4.762 | 9.83 | 5.624 | 62.845 |
628
+ | **distil-large-v3-fr** | **12.675** | **5.865** | **10.832** | **7.989** | **106.291** |
629
+
630
 
631
  *the above datasets correspond to test splits
632
 
 
634
  ### Long-Form
635
 
636
 
637
+ | Model Name | RTFx | [long-form test set](https://huggingface.co/datasets/eustlb/french-long-form-test) |
638
  | :----------------: | :-----: | :--------------------------------------------------------------------------------: |
639
+ | whisper-tiny | 121.389 | 28.158 |
640
+ | whisper-base | 109.366 | 18.665 |
641
+ | whisper-small | 83.049 | 12.557 |
642
+ | whisper-medium | 47.807 | 11.023 |
643
+ | whisper-large-v3 | 38.294 | 9.008 |
644
+ | distil-large-v3-fr | 101.326 | 11.13 |
645
+
646
 
647
 
648