Update README.md (#16)
Browse files- Update README.md (d363e422d22775d419c55a81f30ec4779c0ad736)
- Update README.md (d572d4ec3d79ae8f1b271b2dd305f2a35a7a640a)
Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>
README.md
CHANGED
@@ -257,6 +257,56 @@ For more details, refer to the [Transformers documentation](https://huggingface.
|
|
257 |
|
258 |
</details>
|
259 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
260 |
### Inputs and outputs
|
261 |
|
262 |
* **Input:** Text string, such as a question, a prompt, or a document to be
|
|
|
257 |
|
258 |
</details>
|
259 |
|
260 |
+
### Chat Template
|
261 |
+
|
262 |
+
The instruction-tuned models use a chat template that must be adhered to for conversational use.
|
263 |
+
The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
|
264 |
+
|
265 |
+
Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
|
266 |
+
|
267 |
+
```py
|
268 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
269 |
+
import transformers
|
270 |
+
import torch
|
271 |
+
|
272 |
+
model_id = "google/gemma-2-2b-it"
|
273 |
+
dtype = torch.bfloat16
|
274 |
+
|
275 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
276 |
+
model = AutoModelForCausalLM.from_pretrained(
|
277 |
+
model_id,
|
278 |
+
device_map="cuda",
|
279 |
+
torch_dtype=dtype,)
|
280 |
+
|
281 |
+
chat = [
|
282 |
+
{ "role": "user", "content": "Write a hello world program" },
|
283 |
+
]
|
284 |
+
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
|
285 |
+
```
|
286 |
+
|
287 |
+
At this point, the prompt contains the following text:
|
288 |
+
|
289 |
+
```
|
290 |
+
<bos><start_of_turn>user
|
291 |
+
Write a hello world program<end_of_turn>
|
292 |
+
<start_of_turn>model
|
293 |
+
```
|
294 |
+
|
295 |
+
As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
|
296 |
+
(either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
|
297 |
+
the `<end_of_turn>` token.
|
298 |
+
|
299 |
+
You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
|
300 |
+
chat template.
|
301 |
+
|
302 |
+
After the prompt is ready, generation can be performed like this:
|
303 |
+
|
304 |
+
```py
|
305 |
+
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
306 |
+
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
|
307 |
+
print(tokenizer.decode(outputs[0]))
|
308 |
+
```
|
309 |
+
|
310 |
### Inputs and outputs
|
311 |
|
312 |
* **Input:** Text string, such as a question, a prompt, or a document to be
|