Improved quantization using Openvino 2024.5.0rc1

Files changed (7) hide show

README.md CHANGED Viewed

@@ -7,21 +7,20 @@ tags:
 This is an INT4 quantized version of the `meta-llama/Llama-2-13b-chat-hf` model. The Python packages used in creating this model are as follows:
 ```
-openvino==2024.4.0
 optimum==1.23.3
 optimum-intel==1.20.1
 nncf==2.13.0
 torch==2.5.1
-transformers==4.46.1
 ```
 This quantized model is created using the following command:
 ```
-optimum-cli export openvino -m "meta-llama/Llama-2-13b-chat-hf" --task text-generation-with-past --weight-format int4 --group-size 128 --trust-remote-code ./Llama-2-13b-chat-hf-ov-int4
 ```
 For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
 INFO:nncf:Statistics of the bitwidth distribution:
-| Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
-|--------------|---------------------------|--------------------------------------|
-| 8 | 3% (2 / 282) | 0% (0 / 280) |
-| 4 | 97% (280 / 282) | 100% (280 / 280) |

 This is an INT4 quantized version of the `meta-llama/Llama-2-13b-chat-hf` model. The Python packages used in creating this model are as follows:
 ```
+openvino==2024.5.0rc1
 optimum==1.23.3
 optimum-intel==1.20.1
 nncf==2.13.0
 torch==2.5.1
+transformers==4.46.2
 ```
 This quantized model is created using the following command:
 ```
+optimum-cli export openvino --model "meta-llama/Llama-2-13b-chat-hf" --weight-format int4 --group-size 128 --sym --ratio 1 --all-layers ./Llama-2-13b-chat-hf-ov-int4
 ```
 For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
 INFO:nncf:Statistics of the bitwidth distribution:
+| Num bits (N)   | % all parameters (layers)   | % ratio-defining parameters (layers)   |
+|----------------|-----------------------------|----------------------------------------|
+|              4 | 100% (282 / 282)            | 100% (282 / 282)                       |

config.json CHANGED Viewed

@@ -25,7 +25,7 @@
   "rope_theta": 10000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "float16",
-  "transformers_version": "4.46.1",
   "use_cache": true,
   "vocab_size": 32000
 }

   "rope_theta": 10000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "float16",
+  "transformers_version": "4.46.2",
   "use_cache": true,
   "vocab_size": 32000
 }

generation_config.json CHANGED Viewed

@@ -6,5 +6,5 @@
   "pad_token_id": 0,
   "temperature": 0.6,
   "top_p": 0.9,
-  "transformers_version": "4.46.1"
 }

   "pad_token_id": 0,
   "temperature": 0.6,
   "top_p": 0.9,
+  "transformers_version": "4.46.2"
 }

openvino_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:064cdc248bbfdd396d26d976af4f74f2f9ea4dfe01a00df1128df88a0a9dd7c8
-size 6921224160

 version https://git-lfs.github.com/spec/v1
+oid sha256:e4cfe288e8970a7ae85cff20ba84bb4295446873a212d529919888a28cb8c394
+size 6712750576

openvino_model.xml CHANGED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.model ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723