Can this be uses with WhisperX?

#1
by Dgoryeo - opened

Hi,

WhisperX uses force alignmnet with wav2vec2.0 to produce more accurate timestamps for Whipser transcription outputs. Currently Japanese is not supported by WhisperX. Can this be used with WhisperX to add the language?

https://github.com/m-bain/whisperX

Thanks

Hi @Dgoryeo , I think Japanese is already supported by whisperx and it uses this exact model for that: https://github.com/m-bain/whisperX/blob/main/whisperx/transcribe.py#L31

Thanks for the qick reply. Much appreciated! I'll check it out right away.

hI @jonatasgrosman , I noticed that you have tuned Whisper large model for few languages. Would you have any plans to tune for Japanese too? Thanks.

Hi @Dgoryeo , I plan to do that for other languages in the future too, but for now, I'm out of resources. It's 'cause these large Whisper models are costly to train. I only managed to train some large Whisper models thanks to @sanchit-gandhi , which gave me access to an A100 for a few days.

Can I give you access? I just signed up for free credit from GCP, but I think I have been given T4 level quota, I can double check though.

Hey @Dgoryeo ! You can check out the leaderboard from the Whisper fine-tuning event to see the most performant fine-tuned models in Japanese: https://huggingface.co/spaces/whisper-event/leaderboard?dataset=mozilla-foundation%2Fcommon_voice_11_0&config=ja&split=test

There are a couple of strong large-v2 checkpoints there that might suit your needs!

this model is downloaded when using whisperX and I can see it in the location at this ""C:\Users\username.cache\huggingface\hub\models--jonatasgrosman--wav2vec2-large-xlsr-53-japanese"" but it's not getting used in the whisperX, when I say language as "en" it works and the segments has words as words, but when I use language as "ja" then the words in the segments results are not words but it's letters instead of words

Sign up or log in to comment