Full Name's picture

Full Name PRO

Gatozu35

·

AI & ML interests

Text-to-Speech, Voice Conversion

Recent Activity

liked a model 3 days ago

pipecat-ai/smart-turn

liked a Space 4 days ago

Mobvoi/Offical-Spark-TTS

liked a model 6 days ago

ASLP-lab/DiffRhythm-vae

View all activity

Organizations

Gatozu35's activity

upvoted a collection 6 days ago

OWLS: Scaling Laws for Speech Recognition and Translation

🦉 A suite of Whisper-style models from 250M to 18B parameters. Trained on up to 360K hours of data. • 7 items • Updated 2 days ago • 4

upvoted a collection 12 days ago

Zonos-v0.1

3 items • Updated 26 days ago • 23

upvoted a paper 13 days ago

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Paper • 2305.10005 • Published May 17, 2023 • 3

upvoted a paper 19 days ago

Eager Updates For Overlapped Communication and Computation in DiLoCo

Paper • 2502.12996 • Published 20 days ago • 7

upvoted a collection 20 days ago

Deepseek Papers

Deepseek papers collection • 18 items • Updated 20 days ago • 166

upvoted a paper 26 days ago

QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

Paper • 2502.05178 • Published about 1 month ago • 10

upvoted a paper about 1 month ago

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Paper • 2408.13106 • Published Aug 23, 2024 • 1

upvoted 2 collections about 1 month ago

Text to Speech (TTS)

Text to Speech (TTS) models compatible with txtai's TextToSpeech pipeline. • 7 items • Updated Jan 26 • 6

RWKV7

RWKV7 models • 5 items • Updated 25 days ago • 4

upvoted a paper about 1 month ago

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Paper • 2501.15907 • Published Jan 27 • 16

upvoted a paper about 2 months ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 70

upvoted 2 collections 2 months ago

Sound Datasets

Sound Datasets for ASR/ASV or some other tasks • 12 items • Updated Aug 28, 2024 • 1

Cosmos

The collection of Cosmos models • 31 items • Updated Jan 17 • 268

upvoted a paper 3 months ago

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Paper • 2112.07804 • Published Dec 15, 2021 • 1

upvoted a collection 3 months ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 141

upvoted 2 papers 3 months ago

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

Paper • 2411.18447 • Published Nov 27, 2024 • 2

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Paper • 2411.19842 • Published Nov 29, 2024 • 11

upvoted 2 collections 4 months ago

Cosmos Tokenizer

A suite of image and video tokenizers • 13 items • Updated Jan 17 • 39

Molmo

Artifacts for open multimodal language models. • 5 items • Updated 27 days ago • 298

upvoted a paper 4 months ago

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Paper • 2410.23320 • Published Oct 30, 2024 • 8