Muhtasham Oblokulov's picture

Muhtasham Oblokulov PRO

muhtasham

AI & ML interests

None yet

Recent Activity

liked a dataset about 3 hours ago
muhtasham/gsm8k-tajik
updated a collection about 4 hours ago
Tajik Language Models
updated a collection about 4 hours ago
Tajik Language Models
View all activity

Organizations

Deprem Yapay Zeka's profile picture Spaces-explorers's profile picture Amazon SageMaker Community's profile picture 🤗 Course Team AI Law Assistant's profile picture Training Transformers Together's profile picture Keras's profile picture CVPR Demo Track's profile picture Technical University of Munich's profile picture HugGAN Community's profile picture Eddevs's profile picture Gradio-Blocks-Party's profile picture Webhooks Explorers (BETA)'s profile picture fastai X Hugging Face Group 2022's profile picture EuroPython 2022's profile picture ICML 2022's profile picture BigCode's profile picture Munich NLP's profile picture SIGGRAPH 2022's profile picture Sabanci University's profile picture Ludwig Maximilian University of Munich's profile picture Blog-explorers's profile picture ilm's profile picture MLX Vision's profile picture ZeroGPU Explorers's profile picture Unofficial Mistral Community's profile picture MLX Community's profile picture 7wonders-of-ai's profile picture Hugging Face Discord Community's profile picture Hugging Face Party @ PyTorch Conference's profile picture

muhtasham's activity

reacted to hexgrad's post with 🔥 about 17 hours ago
view post
Post
1345
Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy
upvoted an article 1 day ago
view article
Article

The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...

60
upvoted 2 articles 1 day ago
view article
Article

The N Implementation Details of RLHF with PPO

34