OpenVINO
/

Llama-3.1-8B-Instruct-FastDraft-150M-int8-ov

OpenVINO

llama

Model card Files Files and versions Community

katuni4ka commited on Nov 21, 2024

Commit

3eba8eb

verified ·

1 Parent(s): f7adfa2

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -23

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ license: other
 ## Description
 FastDraft is a novel and efficient approach for pre-training and aligning a draft model to any LLM to be used with speculative decoding, by incorporating efficient pre-training followed by fine-tuning over synthetic datasets generated by the target model.
-FastDraft was presented in https://arxiv.org/abs/2411.11055 at ENLSP@NeurIPS24 by Intel Labs.
 This is a draft model that was trained with FastDraft to accompany [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
@@ -16,23 +16,7 @@ This is Llama-3.1-8B-Instruct-FastDraft-150M model converted to the [OpenVINO™
 Weight compression was performed using `nncf.compress_weights` with the following parameters:
-<nncf>
-    <weight_compression>
-        <all_layers value="False"/>
-        <awq value="False"/>
-        <group_size value="128"/>
-        <ignored_scope>
-            <names value="[]"/>
-            <patterns value="[]"/>
-            <subgraphs value="[]"/>
-            <types value="[]"/>
-            <validate value="True"/>
-        </ignored_scope>
-        <mode value="int8"/>
-        <ratio value="1"/>
-        <sensitivity_metric value="weight_quantization_error"/>
-    </weight_compression>
-</nncf>
 For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
@@ -40,8 +24,8 @@ For more information on quantization, check the [OpenVINO model optimization gui
 The provided OpenVINO™ IR model is compatible with:
-* OpenVINO version <2024.4 > and higher
-* Optimum Intel <1.20.0> and higher
 ## Running Model Inference with OpenVINO GenAI
@@ -55,8 +39,7 @@ pip install openvino-genai huggingface_hub
 Note: run model with demo, you will need to accept license agreement.
 You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct),
-carefully read terms of usage and click accept button.  You will need
-to use an access token for the code below to run. For more information
 on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
 ```
@@ -107,4 +90,4 @@ More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://
 ## Disclaimer
-Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.

 ## Description
 FastDraft is a novel and efficient approach for pre-training and aligning a draft model to any LLM to be used with speculative decoding, by incorporating efficient pre-training followed by fine-tuning over synthetic datasets generated by the target model.
+FastDraft was presented in the [paper](https://arxiv.org/abs/2411.11055) at ENLSP@NeurIPS24 by Intel Labs.
 This is a draft model that was trained with FastDraft to accompany [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
 Weight compression was performed using `nncf.compress_weights` with the following parameters:
+* mode: **INT8_ASYM**
 For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
 The provided OpenVINO™ IR model is compatible with:
+* OpenVINO version **2024.4** and higher
+* Optimum Intel **1.20.0** and higher
 ## Running Model Inference with OpenVINO GenAI
 Note: run model with demo, you will need to accept license agreement.
 You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct),
+carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information
 on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
 ```
 ## Disclaimer
+Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.