cerebras
/

Cerebras-LLaVA-7B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

aarticerebras commited on Mar 19, 2024

Commit

ce72365

·

verified ·

1 Parent(s): b26a998

Update README.md

Files changed (1) hide show

README.md +15 -5

README.md CHANGED Viewed

@@ -8,17 +8,21 @@ The vision encoder checkpoints for this model can be found at [cerebras/Cerebras
 **Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
-For full details of this model and training details, please read our paper and release blog post **to be released shortly**.
-# Model Architecture
 Cerebras-LLaVA-7B is a transformer model with the following architecture details
 * Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
 * Large Language Model: Pretrained from Vicuna-7B checkpoints and instruction finetuned on various datasets.
 * Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
-# Loading the model
 This model can directly be loaded using the [LLaVa source code repository](https://github.com/haotian-liu/LLaVA). For installation, please refer to the [instructions in source code repository](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#install).
 ```
 from llava.model.builder import load_pretrained_model
@@ -34,7 +38,13 @@ tokenizer, model, image_processor, context_len = load_pretrained_model(
 )
 ```
-# Acknowledgements
-We are thankful to all Cerebras engineers, past and present, that made this work possible.

 **Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
+For full details of this model and training details, please read our upcoming blog post.
+## License
+## Model Architecture
 Cerebras-LLaVA-7B is a transformer model with the following architecture details
 * Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
 * Large Language Model: Pretrained from Vicuna-7B checkpoints and instruction finetuned on various datasets.
 * Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
+## Loading the model
 This model can directly be loaded using the [LLaVa source code repository](https://github.com/haotian-liu/LLaVA). For installation, please refer to the [instructions in source code repository](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#install).
+We perform all our evaluations using the LLaVA source code repository scripts.
 ```
 from llava.model.builder import load_pretrained_model
 )
 ```
+## Intended Use
+Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.
+Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence
+## Acknowledgements
+We are thankful to all Cerebras engineers that made this work possible.