How to use a normal file with model?
#6
by
eikosa
- opened
How can I run this model with an mp3 for example I have?
eikosa
changed discussion status to
closed
Answer is:
speech, sr = torchaudio.load("asd.ogg")
sampling_rate = 16_000
resampler = torchaudio.transforms.Resample(sr, sampling_rate)
speech = speech.squeeze()
speech = resampler(speech)
input_speech = speech
You can use pipeline
as per the demo at https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2
from transformers import pipeline
device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
out = pipe(audio)["text"]
where audio
is the path to an audio file or a loaded audio array (see https://github.com/huggingface/transformers/blob/c1b9a11dd4be8af32b3274be7c9774d5a917c56d/src/transformers/pipelines/automatic_speech_recognition.py#L201)