rulins
/

blip2-t5-llava

Image-Text-to-Text

Model card Files Files and versions Community

blip2-t5-llava / README.md

rulins's picture

Specify right model card metadata (#1)

85669ef verified 7 months ago

|

history blame contribute delete

611 Bytes

metadata

license: apache-2.0
tags:
  - llava
pipeline_tag: image-text-to-text

Base Model: BLIP2-t5 pretrained version

Finetune data:

LLAVA 150k (sample one pair of instruction-answer if multi-round conversations)
MiniGPT4 3500 pairs

Hyper-parameters:

BLIP2-flant5-xl + LLAVA (initial commits)
- v0:
- lr = 2e-5 --> 0.0 with cosine lr scheduler
- gbs = 32
- image size = 480
- weight decay = 0.05
- v1 (same as LLAVA):
- lr = 2e-5
- gbs = 32
- image size = 224
- weight decay = 0.0
Others
- lr = 2e-5
- gbs = 32
- image size = 224
- weight decay = 0.0