Details about this model

#1
by sushmapiraka - opened

What does this model do?
Like just image and text crossmodal search or does it include audio and video too?

vsearch org

Hi, @sushmapiraka . Thanks for your interest.
This checkpoint only supprt text-image search. However, our methodology [1] can learn text-audio or text-video search if sufficient data and computational resources available.

[1] Retrieval-based Disentangled Representation Learning with Natural Language Supervision

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment