LI

RogerZhuo

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

updated a collection 4 days ago

images

updated a collection 4 days ago

I2V

View all activity

Organizations

RogerZhuo's activity

upvoted a paper about 5 hours ago

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Paper • 2306.07691 • Published Jun 13, 2023 • 8

upvoted 2 papers 5 days ago

VBench: Comprehensive Benchmark Suite for Video Generative Models

Paper • 2311.17982 • Published Nov 29, 2023 • 8

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Paper • 2503.01183 • Published 7 days ago • 26

upvoted a paper 8 days ago

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Paper • 2502.11946 • Published 20 days ago • 2

upvoted a collection 8 days ago

Step-Audio

Collection

Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 3 items • Updated 20 days ago • 30

upvoted 2 papers 8 days ago

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Paper • 2502.05512 • Published 29 days ago • 2

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Paper • 2307.16430 • Published Jul 31, 2023 • 4

upvoted a collection 8 days ago

DeepSeek-V3

Collection

3 items • Updated Jan 6 • 195

upvoted a paper 8 days ago

Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 34

upvoted a paper 9 days ago

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Paper • 2411.18350 • Published Nov 27, 2024 • 27

upvoted an article 10 days ago

Article

Wanx AI ：AlibabaCloud Best Video Generation Model

•

13 days ago

• 6

upvoted a paper 4 months ago

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 44

upvoted a collection 11 months ago

A little guide to building Large Language Models in 2024

Collection

Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1, 2024 • 14