Model fine-tuner?
@sanchit-gandhi Hello! Are you actively working on this? I'm following along eagerly, as I desperately need to fine-tune the large model to always indicate when the speaker turns. (You can get this somewhat working with a modified prompt, but it readily fails.)
Hey! I've opened an active PR on Transformers for ASR fine-tuning: https://github.com/huggingface/transformers/pull/19519
Expect a working script and blog post on the topic next week π€
@sanchit-gandhi Hello! Are you actively working on this? I'm following along eagerly, as I desperately need to fine-tune the large model to always indicate when the speaker turns. (You can get this somewhat working with a modified prompt, but it readily fails.)
@mezaros Do I understand correctly that you want Whisper to perform speaker diarization? And, have you managed to make it work?
@sanchit-gandhi Brilliant, thanks a lot! Just out of interest, is it somehow possible (is it even a good idea?) to modify the final layers of whisper to perform classification for example, instead of transcription? Essentially that the internal representation produced by whisper feeds into the final classification layers, and that the whole thing is trainable / fine-tunable?
Hey
@daniel-v-e
! Sorry for the late reply here. It's for sure possible to modify Whisper to be used for audio classification tasks! You can add a sequence classification layer / head on top of the base model to generate a single class prediction. Refer to MBartForSequenceClassification
to see how we achieve this for the MBART model. The same principle here applies to the Whisper model. IMO this approach should work - it'll just require fine-tuning with correctly formatted data for audio classification.
Sounds good, thank you @sanchit-gandhi ! There isn't maybe somewhere a similar example for multi-class classification? Or is the extension to multiple classes straightforward?
Hey
@daniel-v-e
! You simply need to specify num_labels=...
to .from_pretrained
. The modelling code will take care of the rest (c.f. https://huggingface.co/docs/transformers/model_doc/mbart#transformers.MBartForSequenceClassification.forward.example-2 and https://github.com/huggingface/transformers/blob/b210c83a78022226ce48402cd67d8c8da7afbd8d/src/transformers/models/mbart/modeling_mbart.py#L1499)