StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Paper • 2306.07691 • Published Jun 13, 2023 • 8
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 8
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 7 days ago • 26
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction Paper • 2502.11946 • Published 20 days ago • 2
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 3 items • Updated 20 days ago • 30
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Paper • 2502.05512 • Published 29 days ago • 2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design Paper • 2307.16430 • Published Jul 31, 2023 • 4
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 34
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models Paper • 2411.18350 • Published Nov 27, 2024 • 27
view article Article Wanx AI :AlibabaCloud Best Video Generation Model By LLMhacker • 13 days ago • 6
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 44
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1, 2024 • 14