Commit
•
75040bd
1
Parent(s):
1ca7336
Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ to distill Whisper on other languages. If you are interested in distilling Whisp
|
|
41 |
provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
|
42 |
[Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
|
43 |
|
44 |
-
### Why is
|
45 |
|
46 |
While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
|
47 |
use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
|
@@ -170,7 +170,7 @@ In the following code-snippet, we load the assistant Distil-Whisper model standa
|
|
170 |
specify it as the "assistant model" for generation:
|
171 |
|
172 |
```python
|
173 |
-
from transformers import pipeline,
|
174 |
import torch
|
175 |
from datasets import load_dataset
|
176 |
|
@@ -249,10 +249,6 @@ model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dt
|
|
249 |
|
250 |
### Running Distil-Whisper in `openai-whisper`
|
251 |
|
252 |
-
Coming soon!
|
253 |
-
|
254 |
-
<!---
|
255 |
-
|
256 |
To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
|
257 |
|
258 |
```bash
|
@@ -268,8 +264,8 @@ from datasets import load_dataset
|
|
268 |
from huggingface_hub import hf_hub_download
|
269 |
from whisper import load_model, transcribe
|
270 |
|
271 |
-
|
272 |
-
model = load_model(
|
273 |
|
274 |
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
275 |
sample = dataset[0]["audio"]["array"]
|
@@ -279,22 +275,21 @@ pred_out = transcribe(model, audio=sample)
|
|
279 |
print(pred_out["text"])
|
280 |
```
|
281 |
|
|
|
|
|
|
|
|
|
282 |
To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
|
283 |
|
284 |
```python
|
285 |
pred_out = transcribe(model, audio="audio.mp3")
|
286 |
```
|
287 |
-
--->
|
288 |
|
289 |
### Whisper.cpp
|
290 |
|
291 |
-
Coming soon!
|
292 |
-
|
293 |
-
<!---
|
294 |
-
|
295 |
Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
|
296 |
sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
|
297 |
-
on Mac M1, `distil-
|
298 |
|
299 |
Steps for getting started:
|
300 |
1. Clone the Whisper.cpp repository:
|
@@ -305,23 +300,21 @@ cd whisper.cpp
|
|
305 |
2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
|
306 |
|
307 |
```bash
|
308 |
-
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-
|
309 |
```
|
310 |
|
311 |
Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
|
312 |
|
313 |
```bash
|
314 |
-
wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-
|
315 |
```
|
316 |
|
317 |
3. Run inference using the provided sample audio:
|
318 |
|
319 |
```bash
|
320 |
-
make -j && ./main -m models/ggml-
|
321 |
```
|
322 |
|
323 |
-
--->
|
324 |
-
|
325 |
### Transformers.js
|
326 |
|
327 |
```js
|
|
|
41 |
provided [training code](https://github.com/huggingface/distil-whisper/tree/main/training). We will update the
|
42 |
[Distil-Whisper repository](https://github.com/huggingface/distil-whisper/) with multilingual checkpoints when ready!
|
43 |
|
44 |
+
### Why is distil-small.en slower than distil-large-v2?
|
45 |
|
46 |
While [distil-medium.en](https://huggingface.co/distil-whisper/distil-medium.en) and [distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2)
|
47 |
use two decoder layers each, distil-small.en uses four. Using more decoder layers improves the WER performance of the
|
|
|
170 |
specify it as the "assistant model" for generation:
|
171 |
|
172 |
```python
|
173 |
+
from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor
|
174 |
import torch
|
175 |
from datasets import load_dataset
|
176 |
|
|
|
249 |
|
250 |
### Running Distil-Whisper in `openai-whisper`
|
251 |
|
|
|
|
|
|
|
|
|
252 |
To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
|
253 |
|
254 |
```bash
|
|
|
264 |
from huggingface_hub import hf_hub_download
|
265 |
from whisper import load_model, transcribe
|
266 |
|
267 |
+
distil_small_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
|
268 |
+
model = load_model(distil_small_en)
|
269 |
|
270 |
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
271 |
sample = dataset[0]["audio"]["array"]
|
|
|
275 |
print(pred_out["text"])
|
276 |
```
|
277 |
|
278 |
+
Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently,
|
279 |
+
you can re-use the same example, and the weights will be loaded directly from your cache without having to download them
|
280 |
+
again.
|
281 |
+
|
282 |
To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
|
283 |
|
284 |
```python
|
285 |
pred_out = transcribe(model, audio="audio.mp3")
|
286 |
```
|
|
|
287 |
|
288 |
### Whisper.cpp
|
289 |
|
|
|
|
|
|
|
|
|
290 |
Distil-Whisper can be run from the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) repository with the original
|
291 |
sequential long-form transcription algorithm. In a [provisional benchmark](https://github.com/ggerganov/whisper.cpp/pull/1424#issuecomment-1793513399)
|
292 |
+
on Mac M1, `distil-small.en` is over 4x faster than `large-v2`, while performing to within 1.4% WER over long-form audio.
|
293 |
|
294 |
Steps for getting started:
|
295 |
1. Clone the Whisper.cpp repository:
|
|
|
300 |
2. Download the ggml weights for `distil-small.en` from the Hugging Face Hub:
|
301 |
|
302 |
```bash
|
303 |
+
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='distil-whisper/distil-small.en', filename='ggml-distil-small.en.bin', local_dir='./models')"
|
304 |
```
|
305 |
|
306 |
Note that if you do not have the `huggingface_hub` package installed, you can also download the weights with `wget`:
|
307 |
|
308 |
```bash
|
309 |
+
wget https://huggingface.co/distil-whisper/distil-small.en/resolve/main/ggml-distil-small.en.bin -P ./models
|
310 |
```
|
311 |
|
312 |
3. Run inference using the provided sample audio:
|
313 |
|
314 |
```bash
|
315 |
+
make -j && ./main -m models/ggml-distil-small.en.bin -f samples/jfk.wav
|
316 |
```
|
317 |
|
|
|
|
|
318 |
### Transformers.js
|
319 |
|
320 |
```js
|