Spaces:

distil-whisper
/

whisper-vs-distil-whisper

Runtime error

App Files Files Community

sanchit-gandhi commited on Mar 22, 2024

Commit

72e513f

verified ·

1 Parent(s): 2ab8d12

Update app.py

Browse files

Files changed (1) hide show

app.py +7 -7

app.py CHANGED Viewed

@@ -13,13 +13,13 @@ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
 attn_implementation = "flash_attention_2" if is_flash_attn_2_available() else "sdpa"
 model = AutoModelForSpeechSeq2Seq.from_pretrained(
-    "openai/whisper-large-v2", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation=attn_implementation
 )
 distilled_model = AutoModelForSpeechSeq2Seq.from_pretrained(
-    "distil-whisper/distil-large-v2", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation=attn_implementation
 )
-processor = AutoProcessor.from_pretrained("openai/whisper-large-v2")
 model.to(device)
 distilled_model.to(device)
@@ -44,7 +44,7 @@ distil_pipe = pipeline(
     tokenizer=processor.tokenizer,
     feature_extractor=processor.feature_extractor,
     max_new_tokens=128,
-    chunk_length_s=15,
     torch_dtype=torch_dtype,
     device=device,
     generate_kwargs={"language": "en", "task": "transcribe"},
@@ -115,13 +115,13 @@ if __name__ == "__main__":
         )
         gr.HTML(
             f"""
-            <p><a href="https://huggingface.co/distil-whisper/distil-large-v2"> Distil-Whisper</a> is a distilled variant
-            of the <a href="https://huggingface.co/openai/whisper-large-v2"> Whisper</a> model by OpenAI. Compared to Whisper,
             Distil-Whisper runs 6x faster with 50% fewer parameters, while performing to within 1% word error rate (WER) on
             out-of-distribution evaluation data.</p>
             <p>In this demo, we perform a speed comparison between Whisper and Distil-Whisper in order to test this claim.
-            Both models use the <a href="https://huggingface.co/distil-whisper/distil-large-v2#long-form-transcription"> chunked long-form transcription algorithm</a>
             in 🤗 Transformers, as well as Flash Attention. To use Distil-Whisper yourself, check the code examples on the
             <a href="https://github.com/huggingface/distil-whisper#1-usage"> Distil-Whisper repository</a>. To ensure fair
             usage of the Space, we ask that audio file inputs are kept to < 30 mins.</p>

 attn_implementation = "flash_attention_2" if is_flash_attn_2_available() else "sdpa"
 model = AutoModelForSpeechSeq2Seq.from_pretrained(
+    "openai/whisper-large-v3", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation=attn_implementation
 )
 distilled_model = AutoModelForSpeechSeq2Seq.from_pretrained(
+    "distil-whisper/distil-large-v3", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation=attn_implementation
 )
+processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
 model.to(device)
 distilled_model.to(device)
     tokenizer=processor.tokenizer,
     feature_extractor=processor.feature_extractor,
     max_new_tokens=128,
+    chunk_length_s=25,
     torch_dtype=torch_dtype,
     device=device,
     generate_kwargs={"language": "en", "task": "transcribe"},
         )
         gr.HTML(
             f"""
+            <p><a href="https://huggingface.co/distil-whisper/distil-large-v3"> Distil-Whisper</a> is a distilled variant
+            of the <a href="https://huggingface.co/openai/whisper-large-v3"> Whisper</a> model by OpenAI. Compared to Whisper,
             Distil-Whisper runs 6x faster with 50% fewer parameters, while performing to within 1% word error rate (WER) on
             out-of-distribution evaluation data.</p>
             <p>In this demo, we perform a speed comparison between Whisper and Distil-Whisper in order to test this claim.
+            Both models use the <a href="https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form-transcription"> chunked long-form transcription algorithm</a>
             in 🤗 Transformers, as well as Flash Attention. To use Distil-Whisper yourself, check the code examples on the
             <a href="https://github.com/huggingface/distil-whisper#1-usage"> Distil-Whisper repository</a>. To ensure fair
             usage of the Space, we ask that audio file inputs are kept to < 30 mins.</p>