XuyaoWang commited on
Commit
42f9726
1 Parent(s): cccd75d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Meta-Llama-3.1-8B-Instruct
7
+ ---
8
+ license: llama3.1
9
+ # 🦙 Llama3.1-8b-instruct-vision Model Card
10
+
11
+ ## Model Details
12
+
13
+ This repository contains a reproduced version of the [LLaVA](https://github.com/haotian-liu/LLaVA) model from the [Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) foundation model using the [PKU-Alignment/align-anything](https://github.com/PKU-Alignment/align-anything) library.
14
+
15
+ > **NOTE:** The reproduced version of LLaVA has some different implementation details than the original [LLaVA](https://github.com/haotian-liu/LLaVA) model.
16
+ >
17
+ > 1. The reproduced LLaVA uses a different conversation template than the original [LLaVA](https://github.com/haotian-liu/LLaVA) model.
18
+ > 2. The initial model weights are loaded from Llama 3.1 8B Instruct model ([meta-llama/Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)) rather than [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5).
19
+
20
+ - **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.
21
+ - **Model Type:** An auto-regressive language model based on the transformer architecture.
22
+ - **License:** Non-commercial license.
23
+ - **Fine-tuned from model:** [meta-llama/Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
24
+
25
+ ## Model Sources
26
+
27
+ - **Repository:** <https://github.com/PKU-Alignment/align-anything>
28
+ - **Dataset:**
29
+ - <https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K>
30
+ - <https://huggingface.co/datasets/OpenGVLab/ShareGPT-4o>
31
+ - <https://huggingface.co/datasets/HuggingFaceM4/A-OKVQA>
32
+ - <https://huggingface.co/datasets/Multimodal-Fatima/OK-VQA_train>
33
+ - <https://huggingface.co/datasets/howard-hou/OCR-VQA>
34
+ - <https://huggingface.co/datasets/HuggingFaceM4/VQAv2>
35
+
36
+ ## How to use model (reprod.)
37
+
38
+ - Using transformers
39
+
40
+ ```python
41
+ from transformers import (
42
+ LlavaForConditionalGeneration,
43
+ AutoProcessor,
44
+ )
45
+ from PIL import Image
46
+
47
+ path = <path_to_model_dir>
48
+ processor = AutoProcessor.from_pretrained(path)
49
+ model = LlavaForConditionalGeneration.from_pretrained(path)
50
+
51
+ prompt = "<|start_header_id|>user<|end_header_id|>: <image> Give an overview of what's in the image.\n<|start_header_id|>assistant<|end_header_id|>: "
52
+ image_path = "align-anything/assets/test_image.webp"
53
+ image = Image.open(image_path)
54
+
55
+ inputs = processor(text=prompt, images=image, return_tensors="pt")
56
+ outputs = model.generate(**inputs, max_new_tokens=1024)
57
+ print(processor.decode(outputs[0], skip_special_tokens=True))
58
+ ```