Edit model card

Model Card

SNIFFER is a multimodal large language model specifically engineered for Out-Of-Context misinformation detection and explanation. It employs two-stage instruction tuning on InstructBLIP, including news-domain alignment and task-specific tuning.

The whole model is composed of three parts: 1) internal checking that analyzes the consistency of the image and text content; 2) external checking that analyzes the relevance between the context of the retrieved image and the provided text, and 3) composed reasoning that combines the two-pronged analysis to arrive at a final judgment and explanation.

Here the checkpoint is used for the internal checking part.

Model Sources

Results

Dataset: NewsCLIPpings

Model All Fake Real
SAFE 52.8 54.8 52.0
EANN 58.1 61.8 56.2
VisualBERT 58.6 38.9 78.4
CLIP 66.0 64.3 67.7
DT-Transformer 77.1 78.6 75.6
CCN 84.7 84.8 84.5
Neu-Sym detector 68.2 - -
SNIFFER (ours) 88.4 86.9 91.8

Citation

@inproceedings{qi2023sniffer,
  author      = {Qi, Peng and Yan, Zehong and Hsu, Wynne and Lee, Mong Li},
  title       = {SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection},
  booktitle   = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year        = {2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model’s pipeline type. Check the docs .