A Multi-Task, Multi-Modal Approach for Predicting Categorical and Dimensional Emotions Paper • 2401.00536 • Published Dec 31, 2023 • 2
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 9 days ago • 130
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation Paper • 2407.19835 • Published Jul 29 • 21
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1 • 9
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Paper • 2407.13559 • Published Jul 18 • 14