Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper
•
2404.09956
•
Published
•
11
We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best