Papers
arxiv:2405.13459

Adapting Multi-modal Large Language Model to Concept Drift in the Long-tailed Open World

Published on May 22, 2024
Authors:
,
,

Abstract

Real-world data often exhibit extreme imbalances and out-of-distribution (OOD) instances, which significantly biases the model training. While it has been extensively studied in vision and language domains separately, the impact of <PRE_TAG>long-tailed open worlds</POST_TAG> on <PRE_TAG>multi-modal large language models (MLLMs)</POST_TAG> has been largely overlooked. In this paper, we first demonstrate the susceptibility and vulnerability of <PRE_TAG>vision-language models</POST_TAG> to significant biases caused by <PRE_TAG>tail drift</POST_TAG> and <PRE_TAG>out-of-distribution (OOD) drift</POST_TAG> during both the pre-training and fine-tuning stages. To eliminate the bias from different sources, we integrate the tailed drift adaptation and OOD drift detection into a <PRE_TAG>unified framework</POST_TAG> by extending the concept drift theory to multi-modal. Specifically, a <PRE_TAG>T-distribution-based drift adapter</POST_TAG> is proposed to effectively mitigate the bias induced by the long-tailed problem, which also facilitates the model in distinguishing OOD data through <PRE_TAG>explicit distribution modelling</POST_TAG>. Extensive experiments show significant improvements in our model's ability to adapt to tailed drift and OOD drift. Moreover, it enhances the efficiency and accuracy of <PRE_TAG>image-text alignment</POST_TAG> in vision language model pre-training, particularly in the long-tail open world scenario. Furthermore, we create a set of multi-modal datasets called OpenMMlo, specifically tailored for the long-tailed open world scenario, to validate our findings. To foster the development of the multi-modal community, we have made both <PRE_TAG>OpenMMlo datasets</POST_TAG> and our code publicly available at: https://github.com/Anonymous0Knight/ConceptDriftMLLMs.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.13459 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.13459 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.