aarticerebras
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -8,17 +8,21 @@ The vision encoder checkpoints for this model can be found at [cerebras/Cerebras
|
|
8 |
|
9 |
**Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
|
10 |
|
11 |
-
For full details of this model and training details, please read our
|
12 |
|
13 |
-
|
|
|
|
|
|
|
14 |
Cerebras-LLaVA-7B is a transformer model with the following architecture details
|
15 |
* Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
|
16 |
* Large Language Model: Pretrained from Vicuna-7B checkpoints and instruction finetuned on various datasets.
|
17 |
* Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
|
18 |
|
19 |
-
|
20 |
|
21 |
This model can directly be loaded using the [LLaVa source code repository](https://github.com/haotian-liu/LLaVA). For installation, please refer to the [instructions in source code repository](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#install).
|
|
|
22 |
|
23 |
```
|
24 |
from llava.model.builder import load_pretrained_model
|
@@ -34,7 +38,13 @@ tokenizer, model, image_processor, context_len = load_pretrained_model(
|
|
34 |
)
|
35 |
```
|
36 |
|
37 |
-
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
|
|
|
8 |
|
9 |
**Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
|
10 |
|
11 |
+
For full details of this model and training details, please read our upcoming blog post.
|
12 |
|
13 |
+
## License
|
14 |
+
|
15 |
+
|
16 |
+
## Model Architecture
|
17 |
Cerebras-LLaVA-7B is a transformer model with the following architecture details
|
18 |
* Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
|
19 |
* Large Language Model: Pretrained from Vicuna-7B checkpoints and instruction finetuned on various datasets.
|
20 |
* Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
|
21 |
|
22 |
+
## Loading the model
|
23 |
|
24 |
This model can directly be loaded using the [LLaVa source code repository](https://github.com/haotian-liu/LLaVA). For installation, please refer to the [instructions in source code repository](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#install).
|
25 |
+
We perform all our evaluations using the LLaVA source code repository scripts.
|
26 |
|
27 |
```
|
28 |
from llava.model.builder import load_pretrained_model
|
|
|
38 |
)
|
39 |
```
|
40 |
|
41 |
+
## Intended Use
|
42 |
+
Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.
|
43 |
+
|
44 |
+
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence
|
45 |
+
|
46 |
+
|
47 |
+
## Acknowledgements
|
48 |
+
We are thankful to all Cerebras engineers that made this work possible.
|
49 |
|
50 |
|