Large model is not transcribing the complete audio file

#36
by agvega - opened

Hi
I ran the Large (V2) model on AWS Sagemaker, and I noticed it does not always transcribe all the audio file. Sometimes, some parts are simply ignored, specially if the volume is low.
But the funny part, is that the medium and small models they Do transcribe the complete audio.
What can be the issue? Any suggestion? Maybe I need to fine tune, but I cant find documentation on how to fine tune this model.
Any help will be appreciated.

Thanks

Alejandro

Hey @agvega , one thing you can try is normalising the input audio file to zero mean and unit standard deviation. We've found that this helps massively for super loud audio files - the same probably holds for super quiet ones (see https://github.com/huggingface/transformers/issues/19888). You can enable this in the feature extractor by passing do_normalize=True: https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperFeatureExtractor.__call__.do_normalize

Otherwise for fine-tuning, you can follow this blog post: https://huggingface.co/blog/fine-tune-whisper

Sign up or log in to comment