The new Longform transcription method

#76

by deep-intel - opened Feb 5

Feb 5

Is there an option to generate word-level timestamps with the new promising long-form transcription akin to what we have with the pipeline?

https://github.com/huggingface/transformers/pull/27658

Mikeles

Feb 7

Any updates on this?

deep-intel

Feb 8

@patrickvonplaten
Your thoughts please on this since you did pivotal work on this PR

patrickvonplaten

Feb 12

I probably won't find time for this - can we open a feature request on Transformers?

patrickvonplaten

Feb 12

cc @sanchit-gandhi as well

deep-intel

Feb 13

Quick question @patrickvonplaten @sanchit-gandhi - with the new method, do we load the full audio in (GPU) memory in one go? If yes, I guess that is different from how "pipeline" would have handled it? The reason I ask is - I could process very long audio with the pipeline, but an audio of just about 30 min ran out of memory with new long form transcription.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment