WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words Paper • 2312.02931 • Published Dec 5, 2023 • 7 • 1