SLM: Bridge the thin gap between speech and text foundation models Paper • 2310.00230 • Published Sep 30, 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models Paper • 2310.13289 • Published Oct 20, 2023 • 17
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models Paper • 2310.05863 • Published Oct 9, 2023 • 1