weizhiwang
commited on
Commit
•
873e6c4
1
Parent(s):
6dcf08c
Update README.md
Browse files
README.md
CHANGED
@@ -16,9 +16,42 @@ A reproduced LLaVA LVLM based on Llama-3-8B LLM backbone. Not an official implem
|
|
16 |
## Model Details
|
17 |
Follows LLavA-1.5 pre-train and supervised fine-tuning data.
|
18 |
|
19 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
22 |
|
23 |
Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.
|
24 |
|
|
|
16 |
## Model Details
|
17 |
Follows LLavA-1.5 pre-train and supervised fine-tuning data.
|
18 |
|
19 |
+
## How to Use
|
20 |
+
|
21 |
+
You can load the model and perform inference as follows:
|
22 |
+
```python
|
23 |
+
from llava.conversation import conv_templates, SeparatorStyle
|
24 |
+
from llava.model.builder import load_pretrained_model
|
25 |
+
from llava.mm_utils import tokenizer_image_token, process_images, get_model_name_from_path
|
26 |
+
from PIL import Image
|
27 |
+
import requests
|
28 |
+
|
29 |
+
# load model and processor
|
30 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
31 |
+
model_name = get_model_name_from_path(weizhiwang/LLaVA-Llama-3-8B)
|
32 |
+
tokenizer, model, image_processor, context_len = load_pretrained_model(weizhiwang/LLaVA-Llama-3-8B, None, model_name, False, False, device=device)
|
33 |
+
|
34 |
+
# prepare inputs for the model
|
35 |
+
text = '<image>' + '\n' + "Describe the image."
|
36 |
+
conv.append_message(conv.roles[0], text)
|
37 |
+
conv.append_message(conv.roles[1], None)
|
38 |
+
url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
|
39 |
+
image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda()
|
40 |
+
|
41 |
+
# autoregressively generate text
|
42 |
+
with torch.inference_mode():
|
43 |
+
output_ids = model.generate(
|
44 |
+
input_ids,
|
45 |
+
images=image_tensor,
|
46 |
+
do_sample=False,
|
47 |
+
max_new_tokens=512,
|
48 |
+
use_cache=True)
|
49 |
+
|
50 |
+
outputs = tokenizer.batch_decode(output_ids[:, input_ids.shape[1]:], skip_special_tokens=True)
|
51 |
+
print(outputs[0])
|
52 |
+
```
|
53 |
+
|
54 |
|
|
|
55 |
|
56 |
Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.
|
57 |
|