hexgrad/Kokoro-82M · [Artificial Analysis] Request to be added to TTS arena/leaderboard

The arena/leaderboard at https://artificialanalysis.ai/text-to-speech currently lists a StyleTTS 2 model.

Consider adding Kokoro, an 82M param Apache-licensed model that uses a StyleTTS 2 architecture. As described in the README, Kokoro occupied the top spot in the TTS Spaces Arena for a while (prior to getting review bombed this morning).

Let me know if you would like me to open a Gradio API endpoint via HF Spaces.

Alternatively, you can run it locally, which is what the folks at https://hf.co/spaces/TTS-AGI/TTS-Arena have elected to do (see Usage). At least 8 turnkey voicepacks have been released for Kokoro: 2F 2M each from American and British English.

Should you choose to list the model, I will defer to your judgement on how to rank its pricing & latency. It has been a while since I last checked, but informally, I believe Kokoro should clock under 0.1 RTF on Colab's 1x T4 when warm (i.e. <1 second to generate 10 seconds of audio once weights have been loaded in memory). That's with FP32 inference, and there is likely a lot of headroom for optimization, at a minimum once FP16 inference is supported.

Reach me over on Discord at https://discord.gg/QuGxSWBfQy or feel free to reply down below. Thanks!

CC @georgewritescode @mhillsmith @will-aragoai at https://hf.co/ArtificialAnalysis