Long Audio Files
How can I process long audio recordings with this model? (1-60 min)
You can use pipeline
as per the demo at: https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2
This will enable you to transcribe files of up to arbitrary length:
from transformers import pipeline
device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
out = pipe(audio)["text"]
where audio
is the path to an audio file or a loaded audio array (see https://github.com/huggingface/transformers/blob/c1b9a11dd4be8af32b3274be7c9774d5a917c56d/src/transformers/pipelines/automatic_speech_recognition.py#L201)
You can use
pipeline
as per the demo at: https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2This will enable you to transcribe files of up to arbitrary length:
from transformers import pipeline device = 0 if torch.cuda.is_available() else "cpu" pipe = pipeline( task="automatic-speech-recognition", model="openai/whisper-large-v2", chunk_length_s=30, device=device, ) out = pipe(audio)["text"]
where
audio
is the path to an audio file or a loaded audio array (see https://github.com/huggingface/transformers/blob/c1b9a11dd4be8af32b3274be7c9774d5a917c56d/src/transformers/pipelines/automatic_speech_recognition.py#L201)
how can i set output language with this method
Hey @eikosa ! Just make sure you've installed Transformers from main:
pip install git+https://github.com/huggingface/transformers
Then you can change the language as follows:
from transformers import pipeline
device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
# change language as required
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="Spanish", task="transcribe")
out = pipe(audio)["text"]
Hi
@sanchit-gandhi
If we use the pipeline as mentioned in https://huggingface.co/openai/whisper-large-v2 instead of the one which you had specified above, transcription stops after 30 second mark. Can this be solved?
Hey
@kirankumaram
, sorry for the late reply! You just need to specify chunk_length_s=30
when you instantiate the pipeline:
from transformers import pipeline
device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30, # <- this arg lets us do long form transcriptions!
device=device,
)
# load your audio as required
audio = ...
# inference
out = pipe(audio)["text"]
print(out)