---
license: mit
license_link: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE

language:
- multilingual
pipeline_tag: text-generation
tags:
- nlp
- code
- vision
- DirectML
- ONNX
- DML
- ONNXRuntime
- phi3
- nlp
- conversational
- custom_code
inference: false

---
# Phi-3-vision-128k-instruct ONNX models for CPU and CUDA
This repository hosts the optimized versions of [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/) to accelerate inference with ONNX Runtime.
This repository is a clone from [microsoft/Phi-3-vision-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu), with extra files necessary for deploying the model with OpenAI-API-Compatible endpoints through [`embeddedllm`](https://github.com/EmbeddedLLM/embeddedllm) pypi library.

## Usage on Windows (Intel / AMD / Nvidia / Qualcomm)
```powershell
conda create -n onnx python=3.10
conda activate onnx
winget install -e --id GitHub.GitLFS
pip install huggingface-hub[cli]
huggingface-cli download EmbeddedLLM/Phi-3-vision-128k-instruct-onnx --include='onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4' --local-dir .\Phi-3-vision-128k-instruct-onnx
pip install numpy==1.26.4
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py" -OutFile "phi3v.py"
pip install onnxruntime
pip install --pre onnxruntime-genai==0.3.0rc2
python phi3v.py -m .\Phi-3-vision-128k-instruct-onnx
```

# UPSTREAM README.md

# Phi-3-vision-128k-instruct ONNX

This repository hosts the optimized versions of [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/) to accelerate inference with DirectML and ONNX Runtime.

The Phi-3-Vision-128K-Instruct is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.  
The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

## Intended Uses

**Primary use cases**

The model is intended for broad commercial and research use in English. The model provides uses for general purpose AI systems and applications with visual and text input capabilities which require 

1) memory/compute constrained environments;
2) latency bound scenarios;
3) general image understanding;
4) OCR;
5) chart and table understanding.

Our model is designed to accelerate research on efficient language and multimodal models, for use as a building block for generative AI powered features.

**Use case considerations**

Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. 
Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case. 

Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.

## ONNX Models

Here are some of the optimized configurations we have added:
- **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.

## Usage

### Installation and Setup

To use the Phi-3-vision-128k-instruct ONNX model on Windows with DirectML, follow these steps:

1. **Create and activate a Conda environment:**
```sh
conda create -n onnx python=3.10
conda activate onnx
```

2. **Install Git LFS:**
```sh
winget install -e --id GitHub.GitLFS
```

3. **Install Hugging Face CLI:**
```sh
pip install huggingface-hub[cli]
```

4. **Download the model:**
```sh
huggingface-cli download EmbeddedLLM/Phi-3-vision-128k-instruct-onnx --include="onnx/cpu_and_mobile/*" --local-dir .\Phi-3-vision-128k-instruct
```

5. **Install necessary Python packages:**
```sh
pip install numpy==1.26.4
pip install onnxruntime
pip install --pre onnxruntime-genai==0.3.0rc2
```

6. **Install Visual Studio 2015 runtime:**
```sh
conda install conda-forge::vs2015_runtime
```

7. **Download the example script:**
```sh
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
```

8. **Run the example script:**
```sh
python phi3-qa.py -m .\Phi-3-vision-128k-instruct
```

### Hardware Requirements

**Minimum Configuration:**
- **Windows:** DirectX 12-capable GPU (AMD/Nvidia/Intel)
- **CPU:** x86_64 / ARM64

**Tested Configurations:**
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
- **CPU:** AMD Ryzen CPU

## Hardware Supported

The model has been tested on:
- GPU SKU: RTX 4090 (DirectML)

Minimum Configuration Required:
- Windows: DirectX 12-capable GPU and a minimum of 10GB of combined RAM

### Model Description

- **Developed by:**  Microsoft
- **Model type:** ONNX
- **Language(s) (NLP):** Python, C, C++
- **License:** MIT
- **Model Description:** This is a conversion of the Phi-3 Vision 128K Instruct model for ONNX Runtime inference.

## Additional Details
- [**Phi-3 Small, Medium, and Vision Blog**](https://aka.ms/phi3_ONNXBuild24)
- [**Phi-3 Model Blog Link**](https://aka.ms/phi3blog-april)
- [**Phi-3 Model Card**]( https://aka.ms/phi3-medium-4k-instruct)
- [**Phi-3 Technical Report**](https://aka.ms/phi3-tech-report)
- [**Phi-3 on Azure AI Studio**](https://aka.ms/phi3-azure-ai)
  
## License

The model is licensed under the [MIT license](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE).

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.