hexgrad/Kokoro-82M · ps argument removed?

1 day ago

Hi, I'm just wondering if the ps (phonetics) argument removal was by mistake or it's not wanted. It's very useful for my use case. I have many short sentences and it's useful to force the model into a specific pronunciation like take a bow or a bow.

https://huggingface.co/hexgrad/Kokoro-82M/blob/main/kokoro.py#L138

hexgrad

Owner 1 day ago

The ps argument was removed by mistake, added back in https://hf.co/hexgrad/Kokoro-82M/commit/c97b7bbc3e60f447383c79b2f94fee861ff156ac

I have many short sentences and it's useful to force the model into a specific pronunciation like take a bow or a bow.

For very short sentences with no context like "Take a bow" where both pronunciations could be valid, or words with different pronunciations based on speaker preference (gif, route, data, either), manual overrides are necessary.

I have a prototype G2P system (non-espeak, WIP) that should be able to address situations like these with sufficient context. The next iteration of Kokoro will likely be trained on this G2P system. Which implies that if/when the next base model drops under Apache 2.0, it likely dictates an Apache 2.0 release of the G2P system as well.

As an aside, currently the tightest bottleneck for shipping the next model is not algorithms or compute, but data. See https://hf.co/hexgrad/Kokoro-82M/discussions/21 for more.

hexgrad changed discussion status to closed 1 day ago