ColPali
Safetensors
English
vidore-experimental
vidore

what is the processor?

#3
by huythang - opened

As the title, what is the processor? and why we need it

processor takes an image and return both pixel_values, image_grid_thw, attention_mask and input_ids. Those information are useful when computing position_ids, image_embeddings; and the finally the multi-vectors of the image for MaxSim operation and Vision-Text RAG.

Sign up or log in to comment