TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Abstract
We introduce TangoFlux, an efficient Text-to-Audio (TTA) generative model with 515M parameters, capable of generating up to 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU. A key challenge in aligning TTA models lies in the difficulty of creating preference pairs, as TTA lacks structured mechanisms like verifiable rewards or gold-standard answers available for Large Language Models (LLMs). To address this, we propose CLAP-Ranked Preference Optimization (CRPO), a novel framework that iteratively generates and optimizes preference data to enhance TTA alignment. We demonstrate that the audio preference dataset generated using CRPO outperforms existing alternatives. With this framework, TangoFlux achieves state-of-the-art performance across both objective and subjective benchmarks. We open source all code and models to support further research in TTA generation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation (2024)
- CDR: Customizable Density Ratios of Strong-over-weak LLMs for Preference Annotation (2024)
- Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs (2024)
- DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs (2024)
- WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models (2024)
- DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators (2024)
- Does Few-Shot Learning Help LLM Performance in Code Synthesis? (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper