Seungyoun
/

llava-llama-3-8b-hf

Image-Text-to-Text

Safetensors

xtuner

llava

conversational

Model card Files Files and versions Community

Seungyoun commited on Apr 24, 2024

Commit

98bf3df

verified ·

1 Parent(s): f3fed70

Update Quickstart

Browse files

Files changed (1) hide show

README.md +45 -32

README.md CHANGED Viewed

@@ -12,13 +12,55 @@ pipeline_tag: image-text-to-text
 ---
-<div align="center">
-  <img src="https://github.com/InternLM/lmdeploy/assets/36994684/0cf8d00f-e86b-40ba-9b54-dc8f1bc6c8d8" width="600"/>
-[![Generic badge](https://img.shields.io/badge/GitHub-%20XTuner-black.svg)](https://github.com/InternLM/xtuner)
 </div>
 ## Model
@@ -47,35 +89,6 @@ llava-llama-3-8b-v1_1-hf is a LLaVA model fine-tuned from [meta-llama/Meta-Llama
 | LLaVA-Llama-3-8B-v1.1 |       72.3        |       66.4        |    31.6     |   36.8    |   70.1   |   70.0    |      72.9      |        47.7         | 86.4 | 62.6 |  59.0   | 1469/349 |  45.1  |
-## QuickStart
-### Chat with lmdeploy
-1. Installation
-```
-pip install 'lmdeploy>=0.4.0'
-pip install git+https://github.com/haotian-liu/LLaVA.git
-```
-2. Run
-```python
-from lmdeploy import pipeline, ChatTemplateConfig
-from lmdeploy.vl import load_image
-pipe = pipeline('xtuner/llava-llama-3-8b-v1_1-hf',
-                chat_template_config=ChatTemplateConfig(model_name='llama3'))
-image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
-response = pipe(('describe this image', image))
-print(response)
-```
-More details can be found on [inference](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html) and [serving](https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html) docs.
-### Chat with CLI
-See [here](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf/discussions/1)!
 ## Citation

 ---
+## QuickStart
+### Chat with lmdeploy
+1. Installation
+```
+pip install 'lmdeploy>=0.4.0'
+pip install git+https://github.com/haotian-liu/LLaVA.git
+```
+2. Run
+Running with pure `transformers` library
+```python
+from transformers import (
+    LlavaProcessor,
+    LlavaForConditionalGeneration,
+)
+import torch
+from PIL import Image
+import requests
+MODEL_NAME = "Seungyoun/llava-llama-3-8b-hf"
+processor = LlavaProcessor.from_pretrained(MODEL_NAME)
+# add 128257 <image> , <pad>
+processor.tokenizer.add_tokens(["<|image|>", "<pad>"], special_tokens=True)
+model = LlavaForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda:0")
+# resize embeddings
+model.resize_token_embeddings(len(processor.tokenizer))
+# prepare image and text prompt, using the appropriate prompt template
+url = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTd4g61TSw890IYKBbPMgXPyWAKdVOpWWUAF0-FGzgX2Q&s"
+image = Image.open(requests.get(url, stream=True).raw)
+prompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <|image|>\nWhat is shown in this image? ASSISTANT:" # FIX : Chat template
+inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
+# autoregressively complete prompt
+output = model.generate(**inputs, max_new_tokens=100)
+print(processor.decode(output[0], skip_special_tokens=True))
+# What is shown in this image? ASSISTANT: The image shows a heartwarming scene of two dogs sitting together on a couch. The dogs are of different breeds, one being a golden retriever and the other being a tabby cat. The dogs are sitting close together, indicating a strong bond between them. The image captures a beautiful moment of companionship between two different species. sit on couch. golden retriever and tabby cat. dogs are sitting together. companionship between two different species.
+```
+---
 </div>
 ## Model
 | LLaVA-Llama-3-8B-v1.1 |       72.3        |       66.4        |    31.6     |   36.8    |   70.1   |   70.0    |      72.9      |        47.7         | 86.4 | 62.6 |  59.0   | 1469/349 |  45.1  |
 ## Citation