[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
-
ytaek-oh/fsc-clip
Zero-Shot Image Classification • Updated -
ytaek-oh/cc3m-subset-100k
Viewer • Updated • 102k • 5 -
ytaek-oh/laioncoco-subset-100k
Viewer • Updated • 135k • 3 -
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Paper • 2410.05210 • Published • 10