SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training Paper • 2412.15649 • Published 23 days ago
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers Paper • 2412.16102 • Published 23 days ago
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning Paper • 2411.17100 • Published Nov 26, 2024
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17, 2024 • 32
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17, 2024 • 32
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17, 2024 • 32
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback Paper • 2403.18349 • Published Mar 27, 2024
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI Paper • 2205.11029 • Published May 23, 2022
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding Paper • 2402.18262 • Published Feb 28, 2024
MULTI: Multimodal Understanding Leaderboard with Text and Images Paper • 2402.03173 • Published Feb 5, 2024 • 3
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization Paper • 2409.00819 • Published Sep 1, 2024
Zipformer: A faster and better encoder for automatic speech recognition Paper • 2310.11230 • Published Oct 17, 2023
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech Paper • 2401.14321 • Published Jan 25, 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Paper • 2402.08846 • Published Feb 13, 2024 • 1
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context Paper • 2309.08105 • Published Sep 15, 2023
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Paper • 2406.11546 • Published Jun 17, 2024
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS Paper • 2309.07377 • Published Sep 14, 2023