File size: 3,779 Bytes
3782447 0c253db 3782447 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: mit
---
# **Phi-3.5-vision-instruct-onnx-cpu**
<b><ul>Note: This is unoffical version,just for test and dev.</ul></b>
This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert
**Convert Step by step**
1. Installation
```bash
pip install torch transformers onnx onnxruntime
pip install --pre onnxruntime-genai
```
2. Set environment in terminal
```bash
mkdir models
cd models
```
3. Download **microsoft/Phi-3.5-vision-instruct** in models folder
[https://huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
4. Please download these files to Your Phi-3.5-vision-instruct folder
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py
5. Download this file to models folder
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py
6. Go to terminal
Convert ONNX support with FP32
```bash
python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu
```
**Runing it with ORT for GenAI**
```python
import onnxruntime_genai as og
model_path = './Your Phi-3.5-vision-instruct Path'
# Define the path to the image file
# This path points to an image file that will be used for demonstration or testing
img_path = './Your Image Path'
# Create an instance of the Model class from the onnxruntime_genai module
# This instance is initialized with the path to the model file
model = og.Model(model_path)
# Create a multimodal processor using the model instance
# This processor will handle different types of input data (e.g., text, images)
processor = model.create_multimodal_processor()
# Create a stream for tokenizing input data using the processor
# This stream will be used to process and tokenize the input data for the model
tokenizer_stream = processor.create_stream()
text = "Your Prompt"
# Initialize a string variable for the prompt with a user tag
prompt = "<|user|>\n"
# Append an image tag to the prompt
prompt += "<|image_1|>\n"
# Append the text prompt to the prompt string, followed by an end tag
prompt += f"{text}<|end|>\n"
# Append an assistant tag to the prompt, indicating the start of the assistant's response
prompt += "<|assistant|>\n"
image = og.Images.open(img_path)
inputs = processor(prompt, images=image)
# Create an instance of the GeneratorParams class from the onnxruntime_genai module
# This instance is initialized with the model object
params = og.GeneratorParams(model)
# Set the inputs for the generator parameters using the processed inputs
params.set_inputs(inputs)
# Set the search options for the generator parameters
# The max_length parameter specifies the maximum length of the generated output
params.set_search_options(max_length=3072)
generator = og.Generator(model, params)
# Loop until the generator has finished generating tokens
while not generator.is_done():
# Compute the logits (probabilities) for the next token
generator.compute_logits()
# Generate the next token based on the computed logits
generator.generate_next_token()
# Retrieve the newly generated token
new_token = generator.get_next_tokens()[0]
# Decode the new token and append it to the code string
code += tokenizer_stream.decode(new_token)
# Print the decoded token to the console without a newline, and flush the output buffer
print(tokenizer_stream.decode(new_token), end='', flush=True)
```
|