pstan commited on
Commit
7e94e39
·
verified ·
1 Parent(s): 1a46166

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -17,6 +17,12 @@ inference: false
17
  ## Model Summary
18
  This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
19
 
 
 
 
 
 
 
20
  ## Usage
21
 
22
  ### Installation and Setup
@@ -76,12 +82,6 @@ python phi3-qa.py -m .\gemma-2b-it-onnx
76
  - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
77
  - **CPU:** AMD Ryzen CPU
78
 
79
- ## ONNX Models
80
-
81
- Here are some of the optimized configurations we have added:
82
- - **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
83
- - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
84
-
85
  ## Resources and Technical Documentation
86
 
87
  - [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
 
17
  ## Model Summary
18
  This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
19
 
20
+ ## ONNX Models
21
+
22
+ Here are some of the optimized configurations we have added:
23
+ - **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
24
+ - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
25
+
26
  ## Usage
27
 
28
  ### Installation and Setup
 
82
  - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
83
  - **CPU:** AMD Ryzen CPU
84
 
 
 
 
 
 
 
85
  ## Resources and Technical Documentation
86
 
87
  - [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)