Model

Baseline LLaVA model for MEGL-Action

Usage

The model is fine-tuned with lora on the Action dataset, hence needs to be loaded with peft.

base_model = LlavaForConditionalGeneration.from_pretrained(
     "llava-hf/llava-1.5-7b-hf",
     device_map="auto"
)

model = PeftModel.from_pretrained(
     base_model,
     "TnTerry/MEGL-LLaVA-Baseline-Action",
     device_map="auto"
)

To inference with this model, follow the official guidance on 🤗 for LLava inference.

inputs = processor(images=image, text=prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=40)

decoded_output = processor.batch_decode(
    output, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]