Which Vision Encoder was used here?
#9
by
floschne
- opened
Do you have any information about the exact vision encoder which was used?
Hi,
The CLIP vision encoder by OpenAI was used, as can be seen here in the original implementation.
For BakLLaVa, it is openai/clip-vit-large-patch14-336 as seen here.