Stopwolf/speech-to-speech-translation · Audio Course U7 Assessment

Oct 10, 2023

Hey @Stopwolf ! I saw that you left a comment about a previous space (now private) not being submitted to the Unit 7 assessment space - let me know if you encounter any difficulties and I'll do my best to help!

Stopwolf

Owner Oct 11, 2023

•

edited Oct 11, 2023

Hi @sanchit-gandhi ! Yeah, I cant seem to make this space working.. Since I wanted to create a demo for Portuguese language, I wanted to use MMS, or rather mms-tts-por model, so I followed their instruction on how to use it in the model card, but nothing works. I tried using regular tranformers library, the one that is recommended from the PR branch (isn't working since its almost 2k commits behind, and doesn't have VitsOutputModel class), or the latest one. ~~The latest one doesn't output errors~~ Actually, it still outputs an error struct.error: ushort format requires 0 <= number <= (0x7fff * 2 + 1) and runtime isn't stopped, but returns None as an audio output, ~~even though it isn't actually None~~. So I'm kind of stumped, but I'll continue researching the issue.. Thanks for reaching out!

P.S. I'm not using SpeechT5 that I trained on Portuguese from common_voice_13_0, since it outputs pure noise (if you have any tips about that, I'll be grateful to hear them haha)

Stopwolf

Owner Oct 11, 2023

When it comes to the first problem, I fixed it.. seems like the error was really stupid (as they almost always are heh). On the official model card of mms-tts-por, it says to get the model outputs by output = model(**inputs).waveform, instead it has to be:
output = model(**inputs) waveform = output.waveform[0]
that [0] was the missing part causing all of the problems...