RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Abstract
The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on three medical VQA datasets, achieving an average improvement of 20.8% in factual accuracy. We publicly release our benchmark and code in https://github.com/richard-peng-xia/RULE.
Community
๐ฅ Enhanced Factual Accuracy: The proposed RULE framework significantly improves factual accuracy in Medical Large Vision Language Models (Med-LVLMs), achieving an average improvement of 20.8% across three medical VQA datasets.
๐ฅ Innovative Approach: RULE introduces a novel, provably effective strategy to control factuality risk by calibrating the selection of retrieved contexts, addressing the challenge of limited or excessive retrieval.
๐ฅ Balanced Dependence: By curating a preference dataset based on instances of over-reliance on retrieved contexts, RULE fine-tunes the model to balance its dependence on inherent knowledge and retrieved information, reducing the risk of incorrect answers.
๐ฅ Practical Application: The RULE framework offers a practical solution for enhancing the factual accuracy of Med-LVLMs, providing a transparent and efficient approach to integrating external knowledge without compromising the model's inherent capabilities.
Paper: https://arxiv.org/abs/2407.05131
Code (will be released): https://github.com/richard-peng-xia/RULE
Related project: CARES https://cares-ai.github.io/
Paper: https://arxiv.org/abs/2406.06007
Data/Code: https://github.com/richard-peng-xia/CARES
@richardxp888
Congratulations on the release!
If you wish to host a demo please let us know, this seems interesting so we could assign a Zero A100 to this.
Would be nice to host the datasets in CARES on Hugging Face Hub :')
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper