Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition 2 days ago
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions 3 days ago • 1
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model 4 days ago • 1