BiVLC
Collection
BIVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
•
15 items
•
Updated
CLIP_Detector is a model presented in the BiVLC paper for experimentation. It has been trained with the OpenCLIP framework using the CLIP ViT-B-32 model pre-trained by 'openai' as a basis. For binary classification, the encoders are kept frozen. A sigmoid neuron is added over the CLS embedding for the image encoder and over the EOT embedding for the text encoder (more details in the paper). The objective of the model is to classify text and images as natural or synthetic. Hyperparameters:
This work is licensed under a MIT License.
If you find this dataset useful, please consider citing our paper:
@misc{miranda2024bivlc,
title={BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval},
author={Imanol Miranda and Ander Salaberria and Eneko Agirre and Gorka Azkune},
year={2024},
eprint={2406.09952},
archivePrefix={arXiv},
primaryClass={cs.CV}
}