weizhiwang commited on
Commit
873e6c4
1 Parent(s): 6dcf08c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -2
README.md CHANGED
@@ -16,9 +16,42 @@ A reproduced LLaVA LVLM based on Llama-3-8B LLM backbone. Not an official implem
16
  ## Model Details
17
  Follows LLavA-1.5 pre-train and supervised fine-tuning data.
18
 
19
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
22
 
23
  Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.
24
 
 
16
  ## Model Details
17
  Follows LLavA-1.5 pre-train and supervised fine-tuning data.
18
 
19
+ ## How to Use
20
+
21
+ You can load the model and perform inference as follows:
22
+ ```python
23
+ from llava.conversation import conv_templates, SeparatorStyle
24
+ from llava.model.builder import load_pretrained_model
25
+ from llava.mm_utils import tokenizer_image_token, process_images, get_model_name_from_path
26
+ from PIL import Image
27
+ import requests
28
+
29
+ # load model and processor
30
+ device = "cuda" if torch.cuda.is_available() else "cpu"
31
+ model_name = get_model_name_from_path(weizhiwang/LLaVA-Llama-3-8B)
32
+ tokenizer, model, image_processor, context_len = load_pretrained_model(weizhiwang/LLaVA-Llama-3-8B, None, model_name, False, False, device=device)
33
+
34
+ # prepare inputs for the model
35
+ text = '<image>' + '\n' + "Describe the image."
36
+ conv.append_message(conv.roles[0], text)
37
+ conv.append_message(conv.roles[1], None)
38
+ url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
39
+ image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda()
40
+
41
+ # autoregressively generate text
42
+ with torch.inference_mode():
43
+ output_ids = model.generate(
44
+ input_ids,
45
+ images=image_tensor,
46
+ do_sample=False,
47
+ max_new_tokens=512,
48
+ use_cache=True)
49
+
50
+ outputs = tokenizer.batch_decode(output_ids[:, input_ids.shape[1]:], skip_special_tokens=True)
51
+ print(outputs[0])
52
+ ```
53
+
54
 
 
55
 
56
  Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.
57