[Artificial Analysis] Request to be added to TTS arena/leaderboard
The arena/leaderboard at https://artificialanalysis.ai/text-to-speech currently lists a StyleTTS 2 model.
Consider adding Kokoro, an 82M param Apache-licensed model that uses a StyleTTS 2 architecture. As described in the README, Kokoro occupied the top spot in the TTS Spaces Arena for a while (prior to getting review bombed this morning).
Let me know if you would like me to open a Gradio API endpoint via HF Spaces.
Alternatively, you can run it locally, which is what the folks at https://hf.co/spaces/TTS-AGI/TTS-Arena have elected to do (see Usage). At least 8 turnkey voicepacks have been released for Kokoro: 2F 2M each from American and British English.
Should you choose to list the model, I will defer to your judgement on how to rank its pricing & latency. It has been a while since I last checked, but informally, I believe Kokoro should clock under 0.1 RTF on Colab's 1x T4 when warm (i.e. <1 second to generate 10 seconds of audio once weights have been loaded in memory). That's with FP32 inference, and there is likely a lot of headroom for optimization, at a minimum once FP16 inference is supported.
Reach me over on Discord at https://discord.gg/QuGxSWBfQy or feel free to reply down below. Thanks!
CC @georgewritescode @mhillsmith @will-aragoai at https://hf.co/ArtificialAnalysis