Phi-3.5-vision-instruct-onnx-cpu

    Note: This is unoffical version,just for test and dev.

This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert

Convert Step by step

  1. Installation

pip install torch transformers onnx onnxruntime

pip install --pre onnxruntime-genai
  1. Set environment in terminal

mkdir models

cd models 
  1. Download microsoft/Phi-3.5-vision-instruct in models folder

https://huggingface.co/microsoft/Phi-3.5-vision-instruct

  1. Please download these files to Your Phi-3.5-vision-instruct folder

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py

  1. Download this file to models folder

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py

  1. Go to terminal

Convert ONNX support with FP32


python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu

Runing it with ORT for GenAI


import onnxruntime_genai as og

model_path = './Your Phi-3.5-vision-instruct Path'

# Define the path to the image file
# This path points to an image file that will be used for demonstration or testing
img_path = './Your Image Path'


# Create an instance of the Model class from the onnxruntime_genai module
# This instance is initialized with the path to the model file
model = og.Model(model_path)

# Create a multimodal processor using the model instance
# This processor will handle different types of input data (e.g., text, images)
processor = model.create_multimodal_processor()

# Create a stream for tokenizing input data using the processor
# This stream will be used to process and tokenize the input data for the model
tokenizer_stream = processor.create_stream()

text = "Your Prompt"

# Initialize a string variable for the prompt with a user tag
prompt = "<|user|>\n"

# Append an image tag to the prompt
prompt += "<|image_1|>\n"

# Append the text prompt to the prompt string, followed by an end tag
prompt += f"{text}<|end|>\n"

# Append an assistant tag to the prompt, indicating the start of the assistant's response
prompt += "<|assistant|>\n"

image = og.Images.open(img_path)

inputs = processor(prompt, images=image)

# Create an instance of the GeneratorParams class from the onnxruntime_genai module
# This instance is initialized with the model object
params = og.GeneratorParams(model)

# Set the inputs for the generator parameters using the processed inputs
params.set_inputs(inputs)

# Set the search options for the generator parameters
# The max_length parameter specifies the maximum length of the generated output
params.set_search_options(max_length=3072)

generator = og.Generator(model, params)

# Loop until the generator has finished generating tokens
while not generator.is_done():
    # Compute the logits (probabilities) for the next token
    generator.compute_logits()
    
    # Generate the next token based on the computed logits
    generator.generate_next_token()

    # Retrieve the newly generated token
    new_token = generator.get_next_tokens()[0]
    
    # Decode the new token and append it to the code string
    code += tokenizer_stream.decode(new_token)
    
    # Print the decoded token to the console without a newline, and flush the output buffer
    print(tokenizer_stream.decode(new_token), end='', flush=True)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.