jojo1899 commited on
Commit
1195ae5
1 Parent(s): b3b6a0e

Quantized model using updated packages

Browse files
Files changed (3) hide show
  1. README.md +6 -6
  2. tokenizer.json +0 -0
  3. tokenizer_config.json +1 -0
README.md CHANGED
@@ -4,14 +4,14 @@ license: llama2
4
 
5
  This is an INT4 quantized version of the `Llama-2-13b-chat-hf` model. The Python packages used in creating this model are as follows:
6
  ```
7
- onnx==1.16.0
8
- onnxruntime-directml==1.18.0
9
- onnxruntime-genai-directml==0.2.0
10
- torch==2.3.0+cu121
11
- transformers==4.40.1
12
  ```
13
  This quantized model is created using the following command:
14
  ```
15
  python -m onnxruntime_genai.models.builder -m meta-llama/Llama-2-13b-chat-hf -e dml -p int4 --extra_options {"int4_block_size"=128} -o ./Llama-2-13b-chat-hf-onnx-int4
16
  ```
17
- `onnxruntime_genai.models.builder` quantizes the model using `MatMul4BitsQuantizer` from `matmul_4bits_quantizer.py` of `onnxruntime/quantization/` with the `"DEFAULT"` method.
 
4
 
5
  This is an INT4 quantized version of the `Llama-2-13b-chat-hf` model. The Python packages used in creating this model are as follows:
6
  ```
7
+ onnx==1.16.1
8
+ onnxruntime-directml==1.20.0
9
+ onnxruntime-genai-directml==0.4.0
10
+ torch==2.5.1
11
+ transformers==4.45.2
12
  ```
13
  This quantized model is created using the following command:
14
  ```
15
  python -m onnxruntime_genai.models.builder -m meta-llama/Llama-2-13b-chat-hf -e dml -p int4 --extra_options {"int4_block_size"=128} -o ./Llama-2-13b-chat-hf-onnx-int4
16
  ```
17
+ `onnxruntime_genai.models.builder` quantizes the model using `MatMul4BitsQuantizer` from `matmul_4bits_quantizer.py` of `onnxruntime/quantization/` with the `"DEFAULT"` method.
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,6 +1,7 @@
1
  {
2
  "add_bos_token": true,
3
  "add_eos_token": false,
 
4
  "added_tokens_decoder": {
5
  "0": {
6
  "content": "<unk>",
 
1
  {
2
  "add_bos_token": true,
3
  "add_eos_token": false,
4
+ "add_prefix_space": null,
5
  "added_tokens_decoder": {
6
  "0": {
7
  "content": "<unk>",