weizhiwang
/

LLaVA-Llama-3-8B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

weizhiwang commited on Apr 21

Commit

873e6c4

•

1 Parent(s): 6dcf08c

Update README.md

Files changed (1) hide show

README.md +35 -2

README.md CHANGED Viewed

@@ -16,9 +16,42 @@ A reproduced LLaVA LVLM based on Llama-3-8B LLM backbone. Not an official implem
 ## Model Details
 Follows LLavA-1.5 pre-train and supervised fine-tuning data.
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.

 ## Model Details
 Follows LLavA-1.5 pre-train and supervised fine-tuning data.
+## How to Use
+You can load the model and perform inference as follows:
+```python
+from llava.conversation import conv_templates, SeparatorStyle
+from llava.model.builder import load_pretrained_model
+from llava.mm_utils import tokenizer_image_token, process_images, get_model_name_from_path
+from PIL import Image
+import requests
+# load model and processor
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model_name = get_model_name_from_path(weizhiwang/LLaVA-Llama-3-8B)
+tokenizer, model, image_processor, context_len = load_pretrained_model(weizhiwang/LLaVA-Llama-3-8B, None, model_name, False, False, device=device)
+# prepare inputs for the model
+text = '<image>' + '\n' + "Describe the image."
+conv.append_message(conv.roles[0], text)
+conv.append_message(conv.roles[1], None)
+url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
+image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda()
+# autoregressively generate text
+with torch.inference_mode():
+    output_ids = model.generate(
+        input_ids,
+        images=image_tensor,
+        do_sample=False,
+        max_new_tokens=512,
+        use_cache=True)
+outputs = tokenizer.batch_decode(output_ids[:, input_ids.shape[1]:], skip_special_tokens=True)
+print(outputs[0])
+```
 Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.