Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 27 days ago • 66
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17, 2024 • 18
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16, 2024 • 151
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 58
BigVGAN Collection BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input. • 11 items • Updated 1 day ago • 11
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5