Are Natural Domain Foundation Models Useful for Medical Image Classification?
Abstract
The deep learning field is converging towards the use of general foundation models that can be easily adapted for diverse tasks. While this paradigm shift has become common practice within the field of natural language processing, progress has been slower in computer vision. In this paper we attempt to address this issue by investigating the <PRE_TAG>transferability</POST_TAG> of various state-of-the-art <PRE_TAG>foundation models</POST_TAG> to <PRE_TAG>medical image classification</POST_TAG> tasks. Specifically, we evaluate the performance of five <PRE_TAG>foundation models</POST_TAG>, namely <PRE_TAG>SAM</POST_TAG>, <PRE_TAG>SEEM</POST_TAG>, <PRE_TAG>DINOv2</POST_TAG>, <PRE_TAG>BLIP</POST_TAG>, and <PRE_TAG>OpenCLIP</POST_TAG> across four well-established medical imaging datasets. We explore different training settings to fully harness the potential of these models. Our study shows mixed results. <PRE_TAG>DINOv2</POST_TAG> consistently outperforms the standard practice of <PRE_TAG>ImageNet pretraining</POST_TAG>. However, other <PRE_TAG>foundation models</POST_TAG> failed to consistently beat this established baseline indicating limitations in their <PRE_TAG>transferability</POST_TAG> to <PRE_TAG>medical image classification</POST_TAG> tasks.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper