katuni4ka commited on
Commit
3eba8eb
·
verified ·
1 Parent(s): f7adfa2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -23
README.md CHANGED
@@ -6,7 +6,7 @@ license: other
6
  ## Description
7
 
8
  FastDraft is a novel and efficient approach for pre-training and aligning a draft model to any LLM to be used with speculative decoding, by incorporating efficient pre-training followed by fine-tuning over synthetic datasets generated by the target model.
9
- FastDraft was presented in https://arxiv.org/abs/2411.11055 at ENLSP@NeurIPS24 by Intel Labs.
10
 
11
  This is a draft model that was trained with FastDraft to accompany [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
12
 
@@ -16,23 +16,7 @@ This is Llama-3.1-8B-Instruct-FastDraft-150M model converted to the [OpenVINO™
16
 
17
  Weight compression was performed using `nncf.compress_weights` with the following parameters:
18
 
19
- <nncf>
20
- <weight_compression>
21
- <all_layers value="False"/>
22
- <awq value="False"/>
23
- <group_size value="128"/>
24
- <ignored_scope>
25
- <names value="[]"/>
26
- <patterns value="[]"/>
27
- <subgraphs value="[]"/>
28
- <types value="[]"/>
29
- <validate value="True"/>
30
- </ignored_scope>
31
- <mode value="int8"/>
32
- <ratio value="1"/>
33
- <sensitivity_metric value="weight_quantization_error"/>
34
- </weight_compression>
35
- </nncf>
36
 
37
  For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
38
 
@@ -40,8 +24,8 @@ For more information on quantization, check the [OpenVINO model optimization gui
40
 
41
  The provided OpenVINO™ IR model is compatible with:
42
 
43
- * OpenVINO version <2024.4 > and higher
44
- * Optimum Intel <1.20.0> and higher
45
 
46
  ## Running Model Inference with OpenVINO GenAI
47
 
@@ -55,8 +39,7 @@ pip install openvino-genai huggingface_hub
55
 
56
  Note: run model with demo, you will need to accept license agreement.
57
  You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct),
58
- carefully read terms of usage and click accept button. You will need
59
- to use an access token for the code below to run. For more information
60
  on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
61
 
62
  ```
@@ -107,4 +90,4 @@ More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://
107
 
108
  ## Disclaimer
109
 
110
- Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.
 
6
  ## Description
7
 
8
  FastDraft is a novel and efficient approach for pre-training and aligning a draft model to any LLM to be used with speculative decoding, by incorporating efficient pre-training followed by fine-tuning over synthetic datasets generated by the target model.
9
+ FastDraft was presented in the [paper](https://arxiv.org/abs/2411.11055) at ENLSP@NeurIPS24 by Intel Labs.
10
 
11
  This is a draft model that was trained with FastDraft to accompany [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
12
 
 
16
 
17
  Weight compression was performed using `nncf.compress_weights` with the following parameters:
18
 
19
+ * mode: **INT8_ASYM**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
22
 
 
24
 
25
  The provided OpenVINO™ IR model is compatible with:
26
 
27
+ * OpenVINO version **2024.4** and higher
28
+ * Optimum Intel **1.20.0** and higher
29
 
30
  ## Running Model Inference with OpenVINO GenAI
31
 
 
39
 
40
  Note: run model with demo, you will need to accept license agreement.
41
  You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct),
42
+ carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information
 
43
  on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
44
 
45
  ```
 
90
 
91
  ## Disclaimer
92
 
93
+ Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.