Whisper Distillation

community

https://github.com/huggingface/distil-whisper

Activity Feed Request to join this org

AI & ML interests

Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.

Recent Activity

lhoestq authored a paper 3 days ago

Croissant: A Metadata Format for ML-Ready Datasets

reach-vb authored a paper about 2 months ago

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

patrickvonplaten authored a paper 2 months ago

Pixtral 12B

View all activity

Organization Card

Community About org cards

Distil-Whisper

[Paper] [Models] [Colab] [Training Code]

Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets:

Model	Params / M	Rel. Latency ↑	Short-Form WER ↓	Long-Form WER ↓
large-v3	1550	1.0	8.4	11.0

distil-large-v3	756	6.3	9.7	10.8
distil-large-v2	756	5.8	10.1	11.6
distil-medium.en	394	6.8	11.1	12.4
distil-small.en	166	5.6	12.1	12.8

For most applications, we recommend the latest distil-large-v3 checkpoint, since it is the most performant distilled checkpoint and compatible across all Whisper libraries. The only exception is resource-constrained applications with very little memory, such as on-device or mobile applications, where the distil-small.en is a great choice, since it is only 166M parameters and performs within 4% WER of Whisper large-v3.

Note: Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. We will soon update the repository with multilingual checkpoints when ready!

Collections 5

spaces 2

Whisper vs Distil-Whisper

Whisper Analysis

models 7

distil-whisper/distil-large-v3

Automatic Speech Recognition • Updated Jun 7 • 391k • 271

distil-whisper/distil-large-v3-openai

Automatic Speech Recognition • Updated Mar 27 • 3

distil-whisper/distil-small.en

Automatic Speech Recognition • Updated Mar 25 • 202k • 91

distil-whisper/distil-medium.en

Automatic Speech Recognition • Updated Mar 25 • 88k • 119

distil-whisper/distil-large-v3-ct2

Automatic Speech Recognition • Updated Mar 22 • 38 • 5

distil-whisper/distil-large-v2

Automatic Speech Recognition • Updated Mar 21 • 64k • 505

distil-whisper/distil-large-v3-ggml

Automatic Speech Recognition • Updated Mar 21 • 19

datasets 37

distil-whisper/librispeech_asr_dummy-concatenated

Viewer • Updated Dec 15, 2023 • 17 • 40

distil-whisper/librispeech_asr_dummy

Viewer • Updated Nov 10, 2023 • 146 • 204

distil-whisper/librispeech_long

Viewer • Updated Nov 2, 2023 • 1 • 12.4k

distil-whisper/figures

Viewer • Updated Oct 31, 2023 • 6 • 15.2k • 2

distil-whisper/meanwhile

Viewer • Updated Oct 17, 2023 • 64 • 6.7k

distil-whisper/rev16

Viewer • Updated Oct 17, 2023 • 46 • 102

distil-whisper/earnings22

Viewer • Updated Oct 13, 2023 • 57.5k • 1.5k • 2

distil-whisper/earnings21

Viewer • Updated Oct 13, 2023 • 44 • 127 • 2

distil-whisper/whisper_transcriptions_token_ids

Viewer • Updated Oct 11, 2023 • 340k • 20

distil-whisper/gigaspeech-l-token-ids

Updated Oct 11, 2023 • 40