Lin-K76 commited on
Commit
aeaeb69
·
verified ·
1 Parent(s): 8f95433

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -30,7 +30,7 @@ This optimization reduces the number of bits per parameter from 16 to 8, reducin
30
  Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-tensor quantization is applied, in which a linear scaling per output dimension maps the FP8 representations of the quantized weights and activations.
31
  [AutoFP8](https://github.com/neuralmagic/AutoFP8) is used for quantization with a single instance of every token in random order.
32
 
33
- <!-- ## Deployment
34
 
35
  ### Use with vLLM
36
 
@@ -50,7 +50,7 @@ messages = [
50
  {"role": "user", "content": "Who are you? Please respond in pirate speak!"},
51
  ]
52
 
53
- prompts = tokenizer.apply_chat_template(messages, tokenize=False)
54
 
55
  llm = LLM(model=model_id)
56
 
@@ -60,7 +60,7 @@ generated_text = outputs[0].outputs[0].text
60
  print(generated_text)
61
  ```
62
 
63
- vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details. -->
64
 
65
  ## Creation
66
 
 
30
  Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-tensor quantization is applied, in which a linear scaling per output dimension maps the FP8 representations of the quantized weights and activations.
31
  [AutoFP8](https://github.com/neuralmagic/AutoFP8) is used for quantization with a single instance of every token in random order.
32
 
33
+ ## Deployment
34
 
35
  ### Use with vLLM
36
 
 
50
  {"role": "user", "content": "Who are you? Please respond in pirate speak!"},
51
  ]
52
 
53
+ prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
54
 
55
  llm = LLM(model=model_id)
56
 
 
60
  print(generated_text)
61
  ```
62
 
63
+ vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
64
 
65
  ## Creation
66