openai
/

whisper-large-v2

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

Large model is not transcribing the complete audio file

#36

by agvega - opened Apr 4, 2023

agvega

Apr 4, 2023

Hi
I ran the Large (V2) model on AWS Sagemaker, and I noticed it does not always transcribe all the audio file. Sometimes, some parts are simply ignored, specially if the volume is low.
But the funny part, is that the medium and small models they Do transcribe the complete audio.
What can be the issue? Any suggestion? Maybe I need to fine tune, but I cant find documentation on how to fine tune this model.
Any help will be appreciated.

Thanks

Alejandro

Apr 18, 2023

Hey @agvega , one thing you can try is normalising the input audio file to zero mean and unit standard deviation. We've found that this helps massively for super loud audio files - the same probably holds for super quiet ones (see https://github.com/huggingface/transformers/issues/19888). You can enable this in the feature extractor by passing do_normalize=True: https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperFeatureExtractor.__call__.do_normalize

Otherwise for fine-tuning, you can follow this blog post: https://huggingface.co/blog/fine-tune-whisper

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment