metadata
license: apache-2.0
tags:
- llava
pipeline_tag: image-text-to-text
Base Model: BLIP2-t5 pretrained version
Finetune data:
- LLAVA 150k (sample one pair of instruction-answer if multi-round conversations)
- MiniGPT4 3500 pairs
Hyper-parameters:
BLIP2-flant5-xl + LLAVA (initial commits)
v0:
lr = 2e-5 --> 0.0 with cosine lr scheduler
gbs = 32
image size = 480
weight decay = 0.05
v1 (same as LLAVA):
lr = 2e-5
gbs = 32
image size = 224
weight decay = 0.0
Others
- lr = 2e-5
- gbs = 32
- image size = 224
- weight decay = 0.0