hexgrad/Kokoro-TTS · when it will be available for open source community

arpitsh018

Nov 21, 2024

when it will be available for open source community

Lyrcaxis

Nov 22, 2024

Hey, Nicole is pretty great! Super looking forward to have her read books to me eventually for sleep :)

hexgrad

Owner Nov 22, 2024

Hi there, thanks for your interest and glad you like it! There isn't a release date currently scheduled, but if that were to happen, it would definitely be signposted in this HF Space.

Regarding Nicole, I had to temporarily take her down while upgrading due to an architecture change described in this blog post — https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgrade — but I will add her back very shortly. The other voices will also be restored when I have the bandwidth to do so later; feel free to ping me if one of the voices you were using is not currently available. None of them have been removed from the model, it's just a bit of effort to identify/extract the numerical params for each voice as I upgrade the model.

hexgrad changed discussion status to closed Nov 23, 2024

arpitsh018

Nov 23, 2024

Hi Hexgrad, I’m very impressed with your voice modules. We have some H100 and H200 GPUs available. Let’s connect to discuss potential collaboration on building TTS models that support over 100 languages, with enhanced emotion control and voice cloning capabilities.

Lyrcaxis

Nov 23, 2024

It's a 80M param model that can read english books in ASMR or respond in a highly energetic voice if u ask it to... great as it is imo and wouldn't push it for 100 languages and built-in enhanced emotion controls xD
...just saying. Emotion control could always be part of the inference code instead of built-in on the weights if this evolves to something bigger. Very happy to see this is going well and gathering attention though!

I actually didn't use any of the voices yet because I would rather use an actual API instead of gradio's. Wouldn't mind slow inference either assuming I can leave it running and playback the audio files later on.
Keep up the great work! Might wanna be more expressive with your expectations/hopes about the project though -- opening up for potential collaborators or funding.

If you choose to open source an early (/earlier) version, though, it could be leading and not necessarily take away any funding you'd otherwise get.

hexgrad

Owner Nov 24, 2024

Reopening this for visibility so people have another pathway to see the response up here, since it appears to be the most FAQ. Also wanted to address the following:

I actually didn't use any of the voices yet because I would rather use an actual API instead of gradio's. Wouldn't mind slow inference either assuming I can leave it running and playback the audio files later on.

Working on something along those lines for batched/long-form inference, but still sticking with Gradio for the time being. I think I might be able to pull some tricks to get reasonable CPU inference speed, which avoids the GPU usage limits entirely. It's unclear how bad the latency will be though — if there is simply too much latency on Gradio's end, there are no tricks that can solve that.

I likely do not have the bandwidth to roll my own API solution for at least EOY. I was not impressed with Replicate's speed benchmarks or pricing, and I saw a RapidAPI horror story (it has since been taken down) which makes me hesitant to give them a shot.

Might wanna be more expressive with your expectations/hopes about the project though -- opening up for potential collaborators or funding.

This space has been getting regular updates, but I've been wanting to add an "Updates" tab to be more explicit about past and future updates. Will sit down and write that, somewhere between feature updates and testing the newest checkpoints fresh off the GPU.

hexgrad changed discussion status to open Nov 24, 2024

Boosh

Nov 25, 2024

What is the potential for this to run as realtime local tts.? Seems pretty quick.

soraygoular

Nov 27, 2024

please please please, open source this. I love this model it is amazing. quality and prosody are awesome. it's super fast and efficient. please make it open source

bendangelo

Nov 29, 2024

I also would like to use this model. The best feature is being able to put the ipa phonetics.

Pendrokar

Dec 2, 2024

I also would like to use this model. The best feature is being able to put the ipa phonetics.
@bendangelo

While offtopic, but only slightly as Kokoro is a fine-tuned StyleTTS model, I'd like to mention that I cloned the first StyleTTS space to support ZeroGPU, enabled API and also allow to put IPA symbols within [] brackets.
https://huggingface.co/spaces/Pendrokar/style-tts-2

What is the potential for this to run as realtime local tts.? Seems pretty quick.
@Boosh

Within the Open TTS Tracker I noticed that StyleTTS streaming capability is mentioned. I don't think I added that.
https://huggingface.co/datasets/Pendrokar/open_tts_tracker

Probably refers to this fork of StyleTTS:
https://github.com/NeuralVox/StyleTTS2?tab=readme-ov-file#streaming-api

So maybe is a chance of getting the same for Kokoro

hamidplus30

Dec 2, 2024

please please please, open source this. I love this model it is amazing. quality and prosody are awesome. it's super fast and efficient. please make it open source

amssss

about 1 month ago

its great, BUT will it be ever open source??

hexgrad

Owner 9 days ago

•

edited 9 days ago

Kokoro v0.19 has been open sourced at https://hf.co/hexgrad/Kokoro-82M

It is a limited release, decoder-only with two voicepacks, for leaderboard result reproducibility. There currently isn't a release date scheduled for the other voices.

The weights are Apache 2.0 licensed. Merry Christmas!

Edit: For transparency, a model SHA256 hash equality check has been added to assert that the open sourced v0.19 model is identical to the v0.19 model used in this Space.

soraygoular

9 days ago

Kokoro v0.19 has been open sourced at https://hf.co/hexgrad/Kokoro-82M

It is a limited release, decoder-only with two voicepacks, for leaderboard result reproducibility. There currently isn't a release date scheduled for the other voices.

The weights are Apache 2.0 licensed. Merry Christmas!

Edit: For transparency, a model SHA256 hash equality check has been added to assert that the open sourced v0.19 model is identical to the v0.19 model used in this Space.

Thanks. We all love Kokoro, and we are thrilled about the newer version and, of course, more open-source voices. In the future, I wish someone could sponsor you to train more. The quality is fantastic, and the prosody is amazing.
Writing a paper on it would be amazing, too. You could explain how you achieved this quality superior to the original StyleTTS 2, what techniques you used, how the training data could affect the quality and prosody, and more.