Zhisheng Zheng

zhisheng01

https://zhishengzheng.com/

zhisheng147

AI & ML interests

LLM, Speech and Audio Processing

Recent Activity

upvoted a paper 6 days ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

upvoted a paper 15 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

upvoted a paper 16 days ago

Slamming: Training a Speech Language Model on One GPU in a Day

View all activity

Organizations

None yet

zhisheng01's activity

upvoted a paper 6 days ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published 7 days ago • 58

upvoted a paper 15 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 16 days ago • 69

upvoted a paper 16 days ago

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 22 days ago • 66

upvoted a paper 22 days ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published 23 days ago • 77

upvoted 2 papers about 1 month ago

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Paper • 2502.05176 • Published Feb 7 • 32

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25

upvoted an article about 1 month ago

Article

Recipe: Preparing Multilingual Speech Datasets for TTS Training

and 1 other •

Nov 4, 2024

• 18

upvoted a paper about 2 months ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 48

upvoted 6 papers 5 months ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 92

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 44

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Paper • 2410.04364 • Published Oct 6, 2024 • 28

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Paper • 2410.03825 • Published Oct 4, 2024 • 19

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Paper • 2410.01036 • Published Oct 1, 2024 • 15

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2, 2024 • 31

upvoted 4 papers 6 months ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 138

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18, 2024 • 37

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Paper • 2409.11564 • Published Sep 17, 2024 • 20

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Paper • 2408.14886 • Published Aug 27, 2024 • 10

upvoted a paper 7 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 41

upvoted a paper 8 months ago

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11, 2024 • 17