File size: 1,487 Bytes
82e54bc
bc0a00c
82e54bc
1cad180
bc0a00c
1cad180
bc0a00c
1cad180
bc0a00c
1cad180
bc0a00c
1cad180
b60c625
 
 
 
1cad180
bc0a00c
1cad180
bc0a00c
 
1cad180
bc0a00c
1cad180
bc0a00c
 
1cad180
bc0a00c
1cad180
bc0a00c
 
 
 
1cad180
bc0a00c
1cad180
 
bc0a00c
1cad180
bc0a00c
1cad180
bc0a00c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
{}
---

[![CODE](https://img.shields.io/badge/GitHub-Repository-<COLOR>)](https://github.com/mbzuai-oryx/LLaVA-pp)

# LLaMA-3-V: Extending the Visual Capabilities of LLaVA with Meta-Llama-3-8B-Instruct

## Repository Overview

This repository features LLaVA v1.5 trained with the Meta-Llama-3-8B-Instruct LLM. This integration aims to leverage the strengths of both models to offer advanced vision-language understanding.

## Training Strategy
- **Pretraining:** Only Vision-to-Language projector is trained. The rest of the model is frozen.
- **Fine-tuning:** LLM is LoRA fine-tuned. Only the vision-backbone (CLIP) is kept frozen.
- **Note:** The repository contains projector and LoRA weights.

## Key Components

- **Base Large Language Model (LLM):** [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
- **Base Large Multimodal Model (LMM):** [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA)

## Training Data

- **Pretraining Dataset:** [LCS-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
- **Fine-tuning Dataset:** [LLaVA-Instruct-665K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json)

## Download It As

```
git lfs install
git clone https://huggingface.co/MBZUAI/LLaVA-Meta-Llama-3-8B-Instruct-lora
```

---


## Contributions

Contributions are welcome! Please 🌟 our repository [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) if you find this model useful.

---