How do I supply pixel_values to the model forward?

#36
by justinchums - opened

I am trying to export the model to ExportedProgram, and tried to execute the model first with

import torch
from PIL import Image
from transformers import (
    AutoTokenizer,
    AutoProcessor,
    MllamaForConditionalGeneration,
    AutoModelForCausalLM,
)
import requests

MODEL = "meta-llama/Llama-3.2-11B-Vision"

# tokenizer = AutoTokenizer.from_pretrained(MODEL)
processor = AutoProcessor.from_pretrained(MODEL, device_map="auto")
model = AutoModelForCausalLM.from_pretrained(MODEL)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

prompt = "<|image|><|begin_of_text|>If I had to write a haiku for this one"
inputs = processor(image, prompt, return_tensors="pt").to(model.device)

print(model(**inputs))

but I got

Traceback (most recent call last):
  File "/workspace/ONNXConverter/llama.py", line 24, in <module>
    print(model(**inputs))
          ^^^^^^^^^^^^^^^
  File "/home/justinchu/anaconda3/envs/onnx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/justinchu/anaconda3/envs/onnx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: MllamaForCausalLM.forward() got an unexpected keyword argument 'pixel_values'

What would be the correct way to supply the pixel_values augument? Thanks!

I used the wrong class.

justinchums changed discussion status to closed

Sign up or log in to comment