openai/whisper-large-v2 · How to use a normal file with model?

eikosa

Dec 9, 2022

How can I run this model with an mp3 for example I have?

eikosa changed discussion status to closed Dec 9, 2022

eikosa

Dec 9, 2022

•

edited Dec 9, 2022

Answer is:

speech, sr = torchaudio.load("asd.ogg")

sampling_rate = 16_000

resampler = torchaudio.transforms.Resample(sr, sampling_rate)
speech = speech.squeeze()
speech = resampler(speech)


input_speech = speech

sanchit-gandhi

Dec 12, 2022

You can use pipeline as per the demo at https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2

from transformers import pipeline

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-large-v2",
    chunk_length_s=30,
    device=device,
)

out = pipe(audio)["text"]

where audio is the path to an audio file or a loaded audio array (see https://github.com/huggingface/transformers/blob/c1b9a11dd4be8af32b3274be7c9774d5a917c56d/src/transformers/pipelines/automatic_speech_recognition.py#L201)