lokinfey
/

Phi-3.5-vision-instruct-onnx-cpu

ONNX

Model card Files Files and versions Community

lokinfey commited on Aug 31, 2024

Commit

3782447

verified ·

1 Parent(s): 418d80c

Update READMD.md

Browse files

Files changed (1) hide show

README.md +156 -3

README.md CHANGED Viewed

@@ -1,3 +1,156 @@
----
-license: mit
----

+---
+license: mit
+---
+# **Phi-3.5-vision-instruct-onnx-cpu**
+<b><ul>Note: This is unoffical version,just for test and dev.</ul></b>
+This is the ONNX format FP16/FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert
+**Convert Step by step**
+1. Installation
+```bash
+pip install torch transformers onnx onnxruntime
+pip install --pre onnxruntime-genai
+```
+2. Set environment in terminal
+```bash
+mkdir models
+cd models
+```
+3. Download **microsoft/Phi-3.5-vision-instruct** in models folder
+[https://huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
+4. Please download these files to Your Phi-3.5-vision-instruct folder
+https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json
+https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py
+https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py
+5. Download this file to models folder
+https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py
+6. Go to terminal
+Convert ONNX support with FP16
+```bash
+python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp16 -p f16 -e cpu
+```
+Convert ONNX support with FP32
+```bash
+python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu
+```
+**Runing it with ORT for GenAI**
+```python
+import onnxruntime_genai as og
+model_path = './Your Phi-3.5-vision-instruct Path'
+# Define the path to the image file
+# This path points to an image file that will be used for demonstration or testing
+img_path = './Your Image Path'
+# Create an instance of the Model class from the onnxruntime_genai module
+# This instance is initialized with the path to the model file
+model = og.Model(model_path)
+# Create a multimodal processor using the model instance
+# This processor will handle different types of input data (e.g., text, images)
+processor = model.create_multimodal_processor()
+# Create a stream for tokenizing input data using the processor
+# This stream will be used to process and tokenize the input data for the model
+tokenizer_stream = processor.create_stream()
+text = "Your Prompt"
+# Initialize a string variable for the prompt with a user tag
+prompt = "<|user|>\n"
+# Append an image tag to the prompt
+prompt += "<|image_1|>\n"
+# Append the text prompt to the prompt string, followed by an end tag
+prompt += f"{text}<|end|>\n"
+# Append an assistant tag to the prompt, indicating the start of the assistant's response
+prompt += "<|assistant|>\n"
+image = og.Images.open(img_path)
+inputs = processor(prompt, images=image)
+# Create an instance of the GeneratorParams class from the onnxruntime_genai module
+# This instance is initialized with the model object
+params = og.GeneratorParams(model)
+# Set the inputs for the generator parameters using the processed inputs
+params.set_inputs(inputs)
+# Set the search options for the generator parameters
+# The max_length parameter specifies the maximum length of the generated output
+params.set_search_options(max_length=3072)
+generator = og.Generator(model, params)
+# Loop until the generator has finished generating tokens
+while not generator.is_done():
+    # Compute the logits (probabilities) for the next token
+    generator.compute_logits()
+    # Generate the next token based on the computed logits
+    generator.generate_next_token()
+    # Retrieve the newly generated token
+    new_token = generator.get_next_tokens()[0]
+    # Decode the new token and append it to the code string
+    code += tokenizer_stream.decode(new_token)
+    # Print the decoded token to the console without a newline, and flush the output buffer
+    print(tokenizer_stream.decode(new_token), end='', flush=True)
+```