anymodality/llava-v1.5-7b · Testing SageMaker Endpoint. Odd Results.

Oct 25, 2023

•

edited Oct 25, 2023

First, thank you so much for your work here. Really looking forward to using this as a GPT-4 Vision competitor.

I am trying to test my SageMaker Endpoint with JavaScript, code below. But no matter what I do I always get back basic stuff like "man standing in front of a building" for your test image. Code below. Thoughts?

import { 
  SageMakerRuntimeClient, 
  InvokeEndpointCommand 
} from "@aws-sdk/client-sagemaker-runtime";
const client = new SageMakerRuntimeClient({ region: "us-east-1" });
const command = new InvokeEndpointCommand({
  Body: JSON.stringify({
    "image" : "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png", 
    "question" : "Describe this image",
  }),
  ContentType: 'application/json',
  EndpointName: 'huggingface-pytorch-inference-2023-10-24-22-53-21-123',
  Accept: 'application/json'
});
const data = await client.send(command);
const decoder = new TextDecoder('utf-8');
console.log(decoder.decode(data.Body));

liltom-eth

AnyModality org Oct 25, 2023

@MetaSkills Thanks for testing in javascript.
https://huggingface.co/anymodality/llava-v1.5-7b/discussions/1 also mentioned this issue.
The solution is try this from deploy_llava.ipynb

from llava.conversation import conv_templates, SeparatorStyle
from llava.constants import (
DEFAULT_IMAGE_TOKEN,
DEFAULT_IM_START_TOKEN,
DEFAULT_IM_END_TOKEN,
)
def get_prompt(raw_prompt):
    conv_mode = "llava_v1"
    conv = conv_templates[conv_mode].copy()
    roles = conv.roles
    inp = f"{roles[0]}: {raw_prompt}"
    inp = (
        DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN + "\n" + inp
    )
    conv.append_message(conv.roles[0], inp)
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
    return prompt, stop_str

raw_prompt = "Describe the image and color details."
prompt, stop_str = get_prompt(raw_prompt)
image_path = "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png"
data = {"image" : image_path, "question" : prompt, "stop_str" : stop_str}
output = predictor.predict(data)
print(output)
# The image features a red toy animal, possibly a horse or a donkey, with a pair of glasses on its face.

This helps processing input raw prompt to llava format. And results looks good to me.

Since you are using JavaScript, a solution is to move get_prompt() into predict_fn() from code/inference.py when deploying the model. Feel free to commit this change to the repo. I will update the code later when I have time.

MetaSkills

Oct 26, 2023

Right, the idea is I want to hit this SageMaker Endpoint via some other workload, Lambda, EC2, K8s, etc. So could you share what the final code/inference.py would look like? Not sue what you mean by move.

MetaSkills

Oct 26, 2023

Ended up with a code/inference.py that looks like this below. But working thru this error now:

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

import requests
from PIL import Image
from io import BytesIO
import torch
from transformers import AutoTokenizer

from llava.model import LlavaLlamaForCausalLM
from llava.utils import disable_torch_init
from llava.mm_utils import tokenizer_image_token, KeywordsStoppingCriteria

from llava.conversation import conv_templates, SeparatorStyle
from llava.constants import (
    IMAGE_TOKEN_INDEX,
    DEFAULT_IMAGE_TOKEN,
    DEFAULT_IM_START_TOKEN,
    DEFAULT_IM_END_TOKEN,
)


def model_fn(model_dir):
    kwargs = {"device_map": "auto"}
    kwargs["torch_dtype"] = torch.float16
    model = LlavaLlamaForCausalLM.from_pretrained(
        model_dir, low_cpu_mem_usage=True, **kwargs
    )
    tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False)

    vision_tower = model.get_vision_tower()
    if not vision_tower.is_loaded:
        vision_tower.load_model()
    vision_tower.to(device="cuda", dtype=torch.float16)
    image_processor = vision_tower.image_processor
    return model, tokenizer, image_processor


def predict_fn(data, model_and_tokenizer):
    # unpack model and tokenizer
    model, tokenizer, image_processor = model_and_tokenizer

    # get prompt & parameters
    image_file = data.pop("image", data)
    raw_prompt = data.pop("question", data)
    max_new_tokens = data.pop("max_new_tokens", 1024)
    temperature = data.pop("temperature", 0.2)

    conv_mode = "llava_v1"
    conv = conv_templates[conv_mode].copy()
    roles = conv.roles
    inp = f"{roles[0]}: {raw_prompt}"
    inp = (
        DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN + "\n" + inp
    )
    conv.append_message(conv.roles[0], inp)
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2

    if image_file.startswith("http") or image_file.startswith("https"):
        response = requests.get(image_file)
        image = Image.open(BytesIO(response.content)).convert("RGB")
    else:
        image = Image.open(image_file).convert("RGB")

    disable_torch_init()
    image_tensor = (
        image_processor.preprocess(image, return_tensors="pt")["pixel_values"]
        .half()
        .cuda()
    )

    keywords = [stop_str]
    input_ids = (
        tokenizer_image_token(
            prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt")
        .unsqueeze(0)
        .cuda()
    )
    stopping_criteria = KeywordsStoppingCriteria(
        keywords, tokenizer, input_ids)
    with torch.inference_mode():
        output_ids = model.generate(
            input_ids,
            images=image_tensor,
            do_sample=True,
            temperature=temperature,
            max_new_tokens=max_new_tokens,
            use_cache=True,
            stopping_criteria=[stopping_criteria],
        )
    outputs = tokenizer.decode(
        output_ids[0, input_ids.shape[1]:], skip_special_tokens=True
    ).strip()
    return outputs

MetaSkills

Oct 26, 2023

Got this working! My last error was due to the fact I had the git lfs pointer for the tokenizer.model file vs the actual file itself. The above code/inference.py is working great. Thanks for your help and amazing work!!!

MetaSkills changed discussion status to closed Oct 26, 2023

liltom-eth

AnyModality org Oct 29, 2023

@MetaSkills updated code/inference.py and is ready for deployment!