VLMEvalKit Evaluation Results Collection
Experiment with and compare different tokenizers
Instruction-tuned model for a range of vision-language tasks