BramVanroy commited on
Commit
290ecb9
1 Parent(s): 5f0b6c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -38,6 +38,27 @@ Bram Vanroy. (2023). Llama v2 13b: Finetuned on Dutch Conversational Data. Huggi
38
  }
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Model description
42
 
43
  I could not get the original Llama 2 13B to produce much Dutch, even though the description paper indicates that it was trained on a (small) portion of Dutch data. I therefore
 
38
  }
39
  ```
40
 
41
+ ## Usage
42
+
43
+ ```python
44
+ from transformers import pipeline
45
+
46
+
47
+ # If you want to add a system message, add a dictionary with role "system". However, this will likely have little
48
+ # effect since the model was only finetuned using a single system message.
49
+ messages = [{"role": "user", "content": "Welke talen worden er in België gesproken?"}]
50
+ pipe = pipe = pipeline("text-generation", model="BramVanroy/Llama-2-13b-chat-dutch", device_map="auto")
51
+
52
+ # Just apply the template but leave the tokenization for the pipeline to do
53
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
54
+
55
+ # Only return the newly generated tokens, not prompt+new_tokens (return_full_text=False)
56
+ generated = pipe(prompt, do_sample=True, max_new_tokens=128, return_full_text=False)
57
+
58
+ generated[0]["generated_text"]
59
+ # ' De officiële talen van België zijn Nederlands, Frans en Duits. Daarnaast worden er nog een aantal andere talen gesproken, waaronder Engels, Spaans, Italiaans, Portugees, Turks, Arabisch en veel meer. '
60
+ ```
61
+
62
  ## Model description
63
 
64
  I could not get the original Llama 2 13B to produce much Dutch, even though the description paper indicates that it was trained on a (small) portion of Dutch data. I therefore