Intel
/

neural-chat-7b-v3-3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Haihao commited on Feb 20, 2024

Commit

bd11ee4

·

verified ·

1 Parent(s): 7b86016

Update README.md

Use bf16 compute type

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -157,7 +157,7 @@ So, the sum of 100, 520, and 60 is 680.
 from transformers import AutoTokenizer, TextStreamer
 from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig
 model_name = "Intel/neural-chat-7b-v3-3"
-config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
 prompt = "Once upon a time, there existed a little girl,"
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

 from transformers import AutoTokenizer, TextStreamer
 from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig
 model_name = "Intel/neural-chat-7b-v3-3"
+config = WeightOnlyQuantConfig(compute_dtype="bf16", weight_dtype="int4")
 prompt = "Once upon a time, there existed a little girl,"
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)