The new Longform transcription method
#76
by
deep-intel
- opened
Is there an option to generate word-level timestamps with the new promising long-form transcription akin to what we have with the pipeline?
Any updates on this?
@patrickvonplaten
Your thoughts please on this since you did pivotal work on this PR
I probably won't find time for this - can we open a feature request on Transformers?
cc @sanchit-gandhi as well
Quick question @patrickvonplaten @sanchit-gandhi - with the new method, do we load the full audio in (GPU) memory in one go? If yes, I guess that is different from how "pipeline" would have handled it? The reason I ask is - I could process very long audio with the pipeline, but an audio of just about 30 min ran out of memory with new long form transcription.