Intel
/

phi-2-int4-inc

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

intel/auto-round

Model card Files Files and versions Community

n1ck-guo commited on 29 days ago

Commit

caca832

•

1 Parent(s): 88ef27a

Update README.md

Files changed (1) hide show

README.md +28 -11

README.md CHANGED Viewed

@@ -41,6 +41,23 @@ She is curious and brave and
 """
 ```
 ### Evaluate the model
@@ -55,17 +72,17 @@ auto-round --eval --model Intel/phi-2-int4-inc --device cuda:0 --tasks lambada_o
 | Metric         | FP16   | INT4   |
 | -------------- | ------ | ------ |
-| Avg.           | 0.6131 | 0.6062 |
-| mmlu           | 0.5334 | 0.5241 |
-| lambada_openai | 0.6243 | 0.6039 |
-| hellaswag      | 0.5581 | 0.5487 |
-| winogrande     | 0.7522 | 0.7585 |
-| piqa           | 0.7867 | 0.7840 |
-| truthfulqa_mc1 | 0.3097 | 0.2974 |
-| openbookqa     | 0.4040 | 0.3960 |
-| boolq          | 0.8346 | 0.8346 |
-| arc_easy       | 0.8001 | 0.8013 |
-| arc_challenge  | 0.5282 | 0.5137 |

 """
 ```
+### Intel Gaudi-2 INT4 Inference
+docker image with Gaudi Software Stack is recommended. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/).
+```python
+import habana_frameworks.torch.core as htcore
+import habana_frameworks.torch.hpu as hthpu
+from auto_round import AutoRoundConfig
+from transformers import AutoModelForCausalLM,AutoTokenizer
+quantized_model_dir = "Intel/phi-2-int4-inc"
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
+model = AutoModelForCausalLM.from_pretrained(quantized_model_dir).to('hpu').to(bfloat16)
+text = "下面我来介绍一下阿里巴巴公司,"
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50, do_sample=False)[0]))
+```
 ### Evaluate the model
 | Metric         | FP16   | INT4   |
 | -------------- | ------ | ------ |
+| Avg.           | 0.6131 | 0.6087 |
+| mmlu           | 0.5334 | 0.5417 |
+| lambada_openai | 0.6243 | 0.6088 |
+| hellaswag      | 0.5581 | 0.5520 |
+| winogrande     | 0.7522 | 0.7577 |
+| piqa           | 0.7867 | 0.7911 |
+| truthfulqa_mc1 | 0.3097 | 0.2962 |
+| openbookqa     | 0.4040 | 0.3900 |
+| boolq          | 0.8346 | 0.8333 |
+| arc_easy       | 0.8001 | 0.7980 |
+| arc_challenge  | 0.5282 | 0.5179 |