megrisdal reach-vb HF staff commited on
Commit
9940e98
1 Parent(s): 4aa8393

Update README.md (#16)

Browse files

- Update README.md (d363e422d22775d419c55a81f30ec4779c0ad736)
- Update README.md (d572d4ec3d79ae8f1b271b2dd305f2a35a7a640a)


Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -257,6 +257,56 @@ For more details, refer to the [Transformers documentation](https://huggingface.
257
 
258
  </details>
259
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  ### Inputs and outputs
261
 
262
  * **Input:** Text string, such as a question, a prompt, or a document to be
 
257
 
258
  </details>
259
 
260
+ ### Chat Template
261
+
262
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
263
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
264
+
265
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
266
+
267
+ ```py
268
+ from transformers import AutoTokenizer, AutoModelForCausalLM
269
+ import transformers
270
+ import torch
271
+
272
+ model_id = "google/gemma-2-2b-it"
273
+ dtype = torch.bfloat16
274
+
275
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
276
+ model = AutoModelForCausalLM.from_pretrained(
277
+ model_id,
278
+ device_map="cuda",
279
+ torch_dtype=dtype,)
280
+
281
+ chat = [
282
+ { "role": "user", "content": "Write a hello world program" },
283
+ ]
284
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
285
+ ```
286
+
287
+ At this point, the prompt contains the following text:
288
+
289
+ ```
290
+ <bos><start_of_turn>user
291
+ Write a hello world program<end_of_turn>
292
+ <start_of_turn>model
293
+ ```
294
+
295
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
296
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
297
+ the `<end_of_turn>` token.
298
+
299
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
300
+ chat template.
301
+
302
+ After the prompt is ready, generation can be performed like this:
303
+
304
+ ```py
305
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
306
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
307
+ print(tokenizer.decode(outputs[0]))
308
+ ```
309
+
310
  ### Inputs and outputs
311
 
312
  * **Input:** Text string, such as a question, a prompt, or a document to be