Update README.md (#16)

Browse files

- Update README.md (d363e422d22775d419c55a81f30ec4779c0ad736)
- Update README.md (d572d4ec3d79ae8f1b271b2dd305f2a35a7a640a)

Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +50 -0

README.md CHANGED Viewed

@@ -257,6 +257,56 @@ For more details, refer to the [Transformers documentation](https://huggingface.
 </details>
 ### Inputs and outputs
 *   **Input:** Text string, such as a question, a prompt, or a document to be

 </details>
+### Chat Template
+The instruction-tuned models use a chat template that must be adhered to for conversational use.
+The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
+Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
+```py
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import transformers
+import torch
+model_id = "google/gemma-2-2b-it"
+dtype = torch.bfloat16
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="cuda",
+    torch_dtype=dtype,)
+chat = [
+    { "role": "user", "content": "Write a hello world program" },
+]
+prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+```
+At this point, the prompt contains the following text:
+```
+<bos><start_of_turn>user
+Write a hello world program<end_of_turn>
+<start_of_turn>model
+```
+As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
+(either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
+the `<end_of_turn>` token.
+You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
+chat template.
+After the prompt is ready, generation can be performed like this:
+```py
+inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
+outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
+print(tokenizer.decode(outputs[0]))
+```
 ### Inputs and outputs
 *   **Input:** Text string, such as a question, a prompt, or a document to be