Multimodal Training with Axolotl
#1
by
ritabratamaiti
- opened
According to the config yml, it seems that Axolotl supports multimodal fine-tuning:
# multimodal pretrain
multimodal: true
mm_vision_tower: openai/clip-vit-large-patch14
tune_mm_mlp_adapter: true
mm_freeze_backbone: true
mm_vision_select_layer: -2
mm_projector_type: mlp2x_gelu
mm_image_folder: ./llava/
mm_use_im_patch_token: false
According to the Axolotl GitHub however, this feature is WIP. Is it possible to update Axolotl to use multimodal fine-tuning?