Details about this model

by sushmapiraka - opened Apr 11, 2024

Discussion

sushmapiraka

Apr 11, 2024

What does this model do?
Like just image and text crossmodal search or does it include audio and video too?

jzhoubu

vsearch org Apr 11, 2024

Hi, @sushmapiraka . Thanks for your interest.
This checkpoint only supprt text-image search. However, our methodology [1] can learn text-audio or text-video search if sufficient data and computational resources available.

[1] Retrieval-based Disentangled Representation Learning with Natural Language Supervision

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment