what is the processor?
#3
by
huythang
- opened
As the title, what is the processor? and why we need it
processor takes an image and return both pixel_values, image_grid_thw, attention_mask and input_ids. Those information are useful when computing position_ids, image_embeddings; and the finally the multi-vectors of the image for MaxSim operation and Vision-Text RAG.