study-hjt
/

CodeQwen1.5-7B-Chat-GPTQ-Int8

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

study-hjt commited on Apr 26

Commit

c8e3921

•

1 Parent(s): de2ab4a

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ pipeline_tag: text-generation
 tags:
 - chat
 - gptq
 - int8
 studios:
 - qwen/CodeQwen1.5-7b-Chat-demo
@@ -54,15 +55,15 @@ KeyError: 'qwen2'.
 Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
 ```python
-from modelscope import AutoModelForCausalLM, AutoTokenizer
 device = "cuda" # the device to load the model onto
 model = AutoModelForCausalLM.from_pretrained(
-    "huangjintao/CodeQwen1.5-7B-Chat-GPTQ-Int8",
     torch_dtype="auto",
     device_map="auto"
 )
-tokenizer = AutoTokenizer.from_pretrained("huangjintao/CodeQwen1.5-7B-Chat-GPTQ-Int8")
 prompt = "Write a quicksort algorithm in python."
 messages = [

 tags:
 - chat
 - gptq
+- codeqwen
 - int8
 studios:
 - qwen/CodeQwen1.5-7b-Chat-demo
 Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
 device = "cuda" # the device to load the model onto
 model = AutoModelForCausalLM.from_pretrained(
+    "study-hjt/CodeQwen1.5-7B-Chat-GPTQ-Int8",
     torch_dtype="auto",
     device_map="auto"
 )
+tokenizer = AutoTokenizer.from_pretrained("study-hjt/CodeQwen1.5-7B-Chat-GPTQ-Int8")
 prompt = "Write a quicksort algorithm in python."
 messages = [