EmbeddedLLM
/

gemma-2b-it-onnx

Text Generation

Model card Files Files and versions Community

pstan commited on Jun 17, 2024

Commit

7e94e39

·

verified ·

1 Parent(s): 1a46166

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -17,6 +17,12 @@ inference: false
 ## Model Summary
 This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
 ## Usage
 ### Installation and Setup
@@ -76,12 +82,6 @@ python phi3-qa.py -m .\gemma-2b-it-onnx
 - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
 - **CPU:** AMD Ryzen CPU
-## ONNX Models
-Here are some of the optimized configurations we have added:
-- **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
-- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
 ## Resources and Technical Documentation
 - [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)

 ## Model Summary
 This repository contains optimized versions of the [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
+## ONNX Models
+Here are some of the optimized configurations we have added:
+- **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
+- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
 ## Usage
 ### Installation and Setup
 - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
 - **CPU:** AMD Ryzen CPU
 ## Resources and Technical Documentation
 - [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)