File size: 3,779 Bytes
3782447
 
 
 
 
 
 
 
0c253db
3782447
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: mit
---

# **Phi-3.5-vision-instruct-onnx-cpu**

<b><ul>Note: This is unoffical version,just for test and dev.</ul></b>

This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert


**Convert Step by step**

1. Installation

```bash

pip install torch transformers onnx onnxruntime

pip install --pre onnxruntime-genai

```

2. Set environment in terminal


```bash

mkdir models

cd models 

```



3. Download **microsoft/Phi-3.5-vision-instruct** in models folder

[https://huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)



4. Please download these files to Your Phi-3.5-vision-instruct folder

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py


5. Download this file to models folder

https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py

6. Go to terminal



Convert ONNX support with FP32

```bash

python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu

```



**Runing it with ORT for GenAI**


```python

import onnxruntime_genai as og

model_path = './Your Phi-3.5-vision-instruct Path'

# Define the path to the image file
# This path points to an image file that will be used for demonstration or testing
img_path = './Your Image Path'


# Create an instance of the Model class from the onnxruntime_genai module
# This instance is initialized with the path to the model file
model = og.Model(model_path)

# Create a multimodal processor using the model instance
# This processor will handle different types of input data (e.g., text, images)
processor = model.create_multimodal_processor()

# Create a stream for tokenizing input data using the processor
# This stream will be used to process and tokenize the input data for the model
tokenizer_stream = processor.create_stream()

text = "Your Prompt"

# Initialize a string variable for the prompt with a user tag
prompt = "<|user|>\n"

# Append an image tag to the prompt
prompt += "<|image_1|>\n"

# Append the text prompt to the prompt string, followed by an end tag
prompt += f"{text}<|end|>\n"

# Append an assistant tag to the prompt, indicating the start of the assistant's response
prompt += "<|assistant|>\n"

image = og.Images.open(img_path)

inputs = processor(prompt, images=image)

# Create an instance of the GeneratorParams class from the onnxruntime_genai module
# This instance is initialized with the model object
params = og.GeneratorParams(model)

# Set the inputs for the generator parameters using the processed inputs
params.set_inputs(inputs)

# Set the search options for the generator parameters
# The max_length parameter specifies the maximum length of the generated output
params.set_search_options(max_length=3072)

generator = og.Generator(model, params)

# Loop until the generator has finished generating tokens
while not generator.is_done():
    # Compute the logits (probabilities) for the next token
    generator.compute_logits()
    
    # Generate the next token based on the computed logits
    generator.generate_next_token()

    # Retrieve the newly generated token
    new_token = generator.get_next_tokens()[0]
    
    # Decode the new token and append it to the code string
    code += tokenizer_stream.decode(new_token)
    
    # Print the decoded token to the console without a newline, and flush the output buffer
    print(tokenizer_stream.decode(new_token), end='', flush=True)

```