Multimodal Training with Axolotl

by ritabratamaiti - opened Nov 1, 2023

Nov 1, 2023

According to the config yml, it seems that Axolotl supports multimodal fine-tuning:

# multimodal pretrain
multimodal: true
mm_vision_tower: openai/clip-vit-large-patch14
tune_mm_mlp_adapter: true
mm_freeze_backbone: true
mm_vision_select_layer: -2
mm_projector_type: mlp2x_gelu
mm_image_folder: ./llava/
mm_use_im_patch_token: false

According to the Axolotl GitHub however, this feature is WIP. Is it possible to update Axolotl to use multimodal fine-tuning?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment