Wanted: Peak Data. I'm collecting audio data to train another TTS model: + AVM data: ChatGPT Advanced Voice Mode audio & text from source + Professional audio: Permissive (CC0, Apache, MIT, CC-BY)
This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.
The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.
I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.
Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy