Disappointed with the results - Gibberish, Pauses, Inconsistent Voice, and Pitch is unstable
It's not stable, it produces inconsistent voice, adds pauses, and gibberish voice in between. The pitch is also not consistent. Very disappointed. :( This is no where close to the demo showed on the blog.
Not the author but FWIW this is the base model, not the fine-tuned variant they said the demo uses. Like pre-training models for LLMs, it's likely to not be very usable until you tune it for a more specific purpose.
Are there any instructions to finetune it?
no - and they dont plan to give us any either
I don't understand the intention behind this release.
research .. its a demo - not for the end customer or dev centric - watch there interview ..
Demo on their website is way better and of course, I understand that it is finetuned. Not sure which interview you are referring to, but I didn't see them talk about the CSM-1b in particular being a demo. I would say, even for a demo, the consistency pretty bad. This seems more like the proof-of-technology.