pstan commited on
Commit
6bbfb35
·
verified ·
1 Parent(s): 675f1ee

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - ONNX
6
+ - DML
7
+ - DirectML
8
+ - ONNXRuntime
9
+ - gemma
10
+ - google
11
+ - conversational
12
+ - custom_code
13
+ inference: false
14
+ ---
15
+ # Gemma-2B-Instruct-ONNX
16
+
17
+ ## Model Summary
18
+ This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
19
+
20
+ ## Usage
21
+
22
+ ### Installation and Setup
23
+
24
+ To use the Gemma-2B-Instruct-ONNX model on Windows with DirectML, follow these steps:
25
+
26
+ 1. **Create and activate a Conda environment:**
27
+ ```sh
28
+ conda create -n onnx python=3.10
29
+ conda activate onnx
30
+ ```
31
+
32
+ 2. **Install Git LFS:**
33
+ ```sh
34
+ winget install -e --id GitHub.GitLFS
35
+ ```
36
+
37
+ 3. **Install Hugging Face CLI:**
38
+ ```sh
39
+ pip install huggingface-hub[cli]
40
+ ```
41
+
42
+ 4. **Download the model:**
43
+ ```sh
44
+ huggingface-cli download google/gemma-2b-it-onnx --include="onnx/directml/*" --local-dir .\gemma-2b-it-onnx
45
+ ```
46
+
47
+ 5. **Install necessary Python packages:**
48
+ ```sh
49
+ pip install numpy
50
+ pip install onnxruntime-directml
51
+ pip install --pre onnxruntime-genai-directml
52
+ ```
53
+
54
+ 6. **Install Visual Studio 2015 runtime:**
55
+ ```sh
56
+ conda install conda-forge::vs2015_runtime
57
+ ```
58
+
59
+ 7. **Download the example script:**
60
+ ```sh
61
+ Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
62
+ ```
63
+
64
+ 8. **Run the example script:**
65
+ ```sh
66
+ python phi3-qa.py -m .\gemma-2b-it-onnx
67
+ ```
68
+
69
+ ### Hardware Requirements
70
+
71
+ - **Minimum Configuration:**
72
+ - **Windows:** DirectX 12-capable GPU (AMD/Nvidia)
73
+ - **CPU:** x86_64 / ARM64
74
+
75
+ - **Tested Configurations:**
76
+ - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
77
+ - **CPU:** AMD Ryzen CPU
78
+
79
+ ## ONNX Models
80
+
81
+ Here are some of the optimized configurations we have added:
82
+ - **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
83
+ - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
84
+
85
+ ## Resources and Technical Documentation
86
+
87
+ - [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
88
+ - [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
89
+ - [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-2b-it-gg-hf)
90
+
91
+ ## Terms of Use
92
+
93
+ - [Terms](https://www.kaggle.com/models/google/gemma/license/consent)