pstan commited on
Commit
1282020
1 Parent(s): a9792aa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -0
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_link: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE
4
+
5
+ language:
6
+ - multilingual
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - nlp
10
+ - code
11
+ - vision
12
+ - DirectML
13
+ - ONNX
14
+ - DML
15
+ - ONNXRuntime
16
+ - phi3
17
+ - nlp
18
+ - conversational
19
+ - custom_code
20
+ inference: false
21
+
22
+ ---
23
+ # Phi-3-vision-128k-instruct ONNX
24
+
25
+ This repository hosts the optimized versions of [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/) to accelerate inference with DirectML and ONNX Runtime.
26
+
27
+ The Phi-3-Vision-128K-Instruct is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.
28
+ The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
29
+
30
+ ## Intended Uses
31
+
32
+ **Primary use cases**
33
+
34
+ The model is intended for broad commercial and research use in English. The model provides uses for general purpose AI systems and applications with visual and text input capabilities which require
35
+
36
+ 1) memory/compute constrained environments;
37
+ 2) latency bound scenarios;
38
+ 3) general image understanding;
39
+ 4) OCR;
40
+ 5) chart and table understanding.
41
+
42
+ Our model is designed to accelerate research on efficient language and multimodal models, for use as a building block for generative AI powered features.
43
+
44
+ **Use case considerations**
45
+
46
+ Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios.
47
+ Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case.
48
+
49
+ Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.
50
+
51
+ ## ONNX Models
52
+
53
+ Here are some of the optimized configurations we have added:
54
+ - **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
55
+ - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
56
+
57
+ ## Usage
58
+
59
+ ### Installation and Setup
60
+
61
+ To use the Phi-3-vision-128k-instruct ONNX model on Windows with DirectML, follow these steps:
62
+
63
+ 1. **Create and activate a Conda environment:**
64
+ ```sh
65
+ conda create -n onnx python=3.10
66
+ conda activate onnx
67
+ ```
68
+
69
+ 2. **Install Git LFS:**
70
+ ```sh
71
+ winget install -e --id GitHub.GitLFS
72
+ ```
73
+
74
+ 3. **Install Hugging Face CLI:**
75
+ ```sh
76
+ pip install huggingface-hub[cli]
77
+ ```
78
+
79
+ 4. **Download the model:**
80
+ ```sh
81
+ huggingface-cli download EmbeddedLLM/Phi-3-vision-128k-instruct-onnx --include="onnx/directml/*" --local-dir .\Phi-3-vision-128k-instruct
82
+ ```
83
+
84
+ 5. **Install necessary Python packages:**
85
+ ```sh
86
+ pip install numpy==1.26.4
87
+ pip install onnxruntime-directml
88
+ pip install --pre onnxruntime-genai-directml
89
+ ```
90
+
91
+ 6. **Install Visual Studio 2015 runtime:**
92
+ ```sh
93
+ conda install conda-forge::vs2015_runtime
94
+ ```
95
+
96
+ 7. **Download the example script:**
97
+ ```sh
98
+ Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
99
+ ```
100
+
101
+ 8. **Run the example script:**
102
+ ```sh
103
+ python phi3-qa.py -m .\Phi-3-vision-128k-instruct
104
+ ```
105
+
106
+ ### Hardware Requirements
107
+
108
+ **Minimum Configuration:**
109
+ - **Windows:** DirectX 12-capable GPU (AMD/Nvidia/Intel)
110
+ - **CPU:** x86_64 / ARM64
111
+
112
+ **Tested Configurations:**
113
+ - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
114
+ - **CPU:** AMD Ryzen CPU
115
+
116
+ ## Hardware Supported
117
+
118
+ The model has been tested on:
119
+ - GPU SKU: RTX 4090 (DirectML)
120
+
121
+ Minimum Configuration Required:
122
+ - Windows: DirectX 12-capable GPU and a minimum of 10GB of combined RAM
123
+
124
+ ### Model Description
125
+
126
+ - **Developed by:** Microsoft
127
+ - **Model type:** ONNX
128
+ - **Language(s) (NLP):** Python, C, C++
129
+ - **License:** MIT
130
+ - **Model Description:** This is a conversion of the Phi-3 Vision 128K Instruct model for ONNX Runtime inference.
131
+
132
+ ## Additional Details
133
+ - [**Phi-3 Small, Medium, and Vision Blog**](https://aka.ms/phi3_ONNXBuild24)
134
+ - [**Phi-3 Model Blog Link**](https://aka.ms/phi3blog-april)
135
+ - [**Phi-3 Model Card**]( https://aka.ms/phi3-medium-4k-instruct)
136
+ - [**Phi-3 Technical Report**](https://aka.ms/phi3-tech-report)
137
+ - [**Phi-3 on Azure AI Studio**](https://aka.ms/phi3-azure-ai)
138
+
139
+ ## License
140
+
141
+ The model is licensed under the [MIT license](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE).
142
+
143
+ ## Trademarks
144
+
145
+ This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.