Phi-3.5-vision-instruct-onnx-cpu

Note: This is unoffical version,just for test and dev.

This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert

Convert Step by step

Installation


pip install torch transformers onnx onnxruntime

pip install --pre onnxruntime-genai

Set environment in terminal


mkdir models

cd models

Download microsoft/Phi-3.5-vision-instruct in models folder

https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Please download these files to Your Phi-3.5-vision-instruct folder

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py

Download this file to models folder

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py

Go to terminal

Convert ONNX support with FP32


python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu

Runing it with ORT for GenAI


import onnxruntime_genai as og

model_path = './Your Phi-3.5-vision-instruct Path'

# Define the path to the image file
# This path points to an image file that will be used for demonstration or testing
img_path = './Your Image Path'


# Create an instance of the Model class from the onnxruntime_genai module
# This instance is initialized with the path to the model file
model = og.Model(model_path)

# Create a multimodal processor using the model instance
# This processor will handle different types of input data (e.g., text, images)
processor = model.create_multimodal_processor()

# Create a stream for tokenizing input data using the processor
# This stream will be used to process and tokenize the input data for the model
tokenizer_stream = processor.create_stream()

text = "Your Prompt"

# Initialize a string variable for the prompt with a user tag
prompt = "<|user|>\n"

# Append an image tag to the prompt
prompt += "<|image_1|>\n"

# Append the text prompt to the prompt string, followed by an end tag
prompt += f"{text}<|end|>\n"

# Append an assistant tag to the prompt, indicating the start of the assistant's response
prompt += "<|assistant|>\n"

image = og.Images.open(img_path)

inputs = processor(prompt, images=image)

# Create an instance of the GeneratorParams class from the onnxruntime_genai module
# This instance is initialized with the model object
params = og.GeneratorParams(model)

# Set the inputs for the generator parameters using the processed inputs
params.set_inputs(inputs)

# Set the search options for the generator parameters
# The max_length parameter specifies the maximum length of the generated output
params.set_search_options(max_length=3072)

generator = og.Generator(model, params)

# Loop until the generator has finished generating tokens
while not generator.is_done():
    # Compute the logits (probabilities) for the next token
    generator.compute_logits()
    
    # Generate the next token based on the computed logits
    generator.generate_next_token()

    # Retrieve the newly generated token
    new_token = generator.get_next_tokens()[0]
    
    # Decode the new token and append it to the code string
    code += tokenizer_stream.decode(new_token)
    
    # Print the decoded token to the console without a newline, and flush the output buffer
    print(tokenizer_stream.decode(new_token), end='', flush=True)