danielhanchen commited on
Commit
4cdddb1
·
verified ·
1 Parent(s): ef4f5f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +292 -120
README.md CHANGED
@@ -1,199 +1,371 @@
1
  ---
 
 
 
2
  library_name: transformers
3
- tags: []
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
10
 
 
11
 
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
 
 
 
 
 
 
 
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
 
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
 
 
 
 
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
 
56
- [More Information Needed]
57
 
58
- ## Bias, Risks, and Limitations
 
 
 
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
61
 
62
- [More Information Needed]
 
 
 
63
 
64
- ### Recommendations
 
 
 
 
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
 
114
 
115
- #### Factors
 
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
118
 
119
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
 
121
- #### Metrics
 
 
 
 
 
 
 
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
 
128
 
129
- [More Information Needed]
 
 
 
 
 
 
 
 
 
130
 
131
- #### Summary
 
 
 
 
132
 
 
 
 
 
133
 
 
 
 
 
 
134
 
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
 
154
 
155
- ### Model Architecture and Objective
 
156
 
157
- [More Information Needed]
 
158
 
159
- ### Compute Infrastructure
 
 
 
 
 
 
 
 
160
 
161
- [More Information Needed]
 
 
 
 
 
 
 
 
 
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
 
166
 
167
- #### Software
 
 
168
 
169
- [More Information Needed]
 
 
170
 
171
- ## Citation [optional]
 
 
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
 
174
 
175
- **BibTeX:**
 
 
 
 
 
 
 
 
176
 
177
- [More Information Needed]
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
 
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
 
 
 
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
 
192
 
193
- ## Model Card Authors [optional]
 
194
 
195
- [More Information Needed]
 
196
 
197
- ## Model Card Contact
 
198
 
199
- [More Information Needed]
 
1
  ---
2
+ base_model: meta-llama/Meta-Llama-3.1-8B
3
+ language:
4
+ - en
5
  library_name: transformers
6
+ license: cc-by-nc-4.0
7
+ tags:
8
+ - cohere
9
+ - unsloth
10
+ - transformers
11
  ---
12
 
13
+ # Finetune Llama 3.1, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth!
14
 
15
+ We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing
16
 
17
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/unsloth)
18
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
19
 
20
+ ## ✨ Finetune for Free
21
 
22
+ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ | Unsloth supports | Free Notebooks | Performance | Memory use |
25
+ |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
26
+ | **Llama-3.1 8b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing) | 2.4x faster | 58% less |
27
+ | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/drive/1lN6hPQveB_mHSnTOYifygFcrO8C1bxq4?usp=sharing) | 2x faster | 50% less |
28
+ | **Gemma-2 9b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing) | 2.4x faster | 58% less |
29
+ | **Mistral 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | 2.2x faster | 62% less |
30
+ | **TinyLlama** | [▶️ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing) | 3.9x faster | 74% less |
31
+ | **DPO - Zephyr** | [▶️ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) | 1.9x faster | 19% less |
32
 
33
+ - This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
34
+ - This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
35
+ - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
36
 
37
+ # Model Card for C4AI Command R+ 08-2024
38
 
39
+ ## Model Summary
40
+ C4AI Command R+ 08-2024 is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ 08-2024 is a multilingual model trained on 23 languages and evaluated in 10 languages. Command R+ 08-2024 is optimized for a variety of use cases including reasoning, summarization, and question answering.
41
 
42
+ C4AI Command R+ 08-2024 is part of a family of open weight releases from Cohere For AI and Cohere. Our smaller companion model is [C4AI Command R 08-2024](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024).
43
 
44
+ - Point of Contact: Cohere For AI: [cohere.for.ai](https://cohere.for.ai/)
45
+ - License: [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license), requires also adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy)
46
+ - Model: c4ai-command-r-plus-08-2024
47
+ - Model Size: 104 billion parameters
48
+ - Context length: 128K
49
 
50
+ **Try C4AI Command R+**
51
 
52
+ You can try out C4AI Command R+ before downloading the weights in our hosted [Hugging Face Space](https://huggingface.co/spaces/CohereForAI/c4ai-command?model=command-r-plus-08-2024).
53
 
54
+ **Usage**
55
 
56
+ Please use `transformers` version 4.39.1 or higher
57
+ ```python
58
+ # pip install 'transformers>=4.39.1'
59
+ from transformers import AutoTokenizer, AutoModelForCausalLM
60
 
61
+ model_id = "CohereForAI/c4ai-command-r-plus-08-2024"
62
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
63
+ model = AutoModelForCausalLM.from_pretrained(model_id)
64
 
65
+ # Format message with the command-r-plus-08-2024 chat template
66
+ messages = [{"role": "user", "content": "Hello, how are you?"}]
67
+ input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
68
+ ## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
69
 
70
+ gen_tokens = model.generate(
71
+ input_ids,
72
+ max_new_tokens=100,
73
+ do_sample=True,
74
+ temperature=0.3,
75
+ )
76
 
77
+ gen_text = tokenizer.decode(gen_tokens[0])
78
+ print(gen_text)
79
+ ```
80
 
81
+ ## Model Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
+ **Input**: Models input text only.
84
 
85
+ **Output**: Models generate text only.
86
 
87
+ **Model Architecture**: This is an auto-regressive language model that uses an optimized transformer architecture. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. We use grouped query attention (GQA) to improve inference speed.
88
 
89
+ **Languages covered**: The model has been trained on 23 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Simplified Chinese, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian) and evaluated on 10 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Simplified Chinese).
90
 
91
+ **Context length**: Command R+ 08-2024 supports a context length of 128K.
92
 
 
93
 
94
+ ### Tool use & Agent capabilities:
95
 
96
+ Command R+ 08-2024 has been specifically trained with conversational tool use capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance, but we encourage experimentation.
97
 
98
+ Command R+ 08-2024’s tool use functionality takes a conversation as input (with an optional user-system preamble), along with a list of available tools. The model will then generate a json-formatted list of actions to execute on a subset of those tools. Command R+ 08-2024 may use one of its supplied tools more than once.
99
 
100
+ The model has been trained to recognise a special `directly_answer` tool, which it uses to indicate that it doesn’t want to use any of its other tools. The ability to abstain from calling a specific tool can be useful in a range of situations, such as greeting a user, or asking clarifying questions. We recommend including the `directly_answer` tool, but it can be removed or renamed if required.
101
 
102
+ Comprehensive documentation for working with Command R+ 08-2024's tool use prompt template can be found [here](https://docs.cohere.com/docs/prompting-command-r).
103
 
104
+ Command R+ 08-2024 also supports Hugging Face's [tool use API](https://huggingface.co/docs/transformers/main/en/chat_templating#advanced-tool-use--function-calling).
105
 
106
+ The code snippets below show minimal working examples on how to render a prompt.
107
 
108
+ <details>
109
+ <summary><b>Usage: Rendering Tool Use Prompts [CLICK TO EXPAND]</b> </summary>
110
 
111
+ ```python
112
+ from transformers import AutoTokenizer
113
 
114
+ model_id = "CohereForAI/c4ai-command-r-plus-08-2024"
115
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
116
 
117
+ # define conversation input:
118
+ conversation = [
119
+ {"role": "user", "content": "Whats the biggest penguin in the world?"}
120
+ ]
121
+ # Define tools available for the model to use:
122
+ tools = [
123
+ {
124
+ "name": "internet_search",
125
+ "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
126
+ "parameter_definitions": {
127
+ "query": {
128
+ "description": "Query to search the internet with",
129
+ "type": 'str',
130
+ "required": True
131
+ }
132
+ }
133
+ },
134
+ {
135
+ 'name': "directly_answer",
136
+ "description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
137
+ 'parameter_definitions': {}
138
+ }
139
+ ]
140
 
141
+ # render the tool use prompt as a string:
142
+ tool_use_prompt = tokenizer.apply_tool_use_template(
143
+ conversation,
144
+ tools=tools,
145
+ tokenize=False,
146
+ add_generation_prompt=True,
147
+ )
148
+ print(tool_use_prompt)
149
+ ```
150
 
151
+ </details>
152
 
 
153
 
154
+ <details>
155
+ <summary><b>Usage: Rendering prompts with the Tool Use API [CLICK TO EXPAND]</b> </summary>
156
 
157
+ ```python
158
+ from transformers import AutoTokenizer
159
+
160
+ model_id = "CohereForAI/c4ai-command-r-plus-08-2024"
161
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
162
+
163
+ # define conversation input:
164
+ conversation = [
165
+ {"role": "user", "content": "Whats the biggest penguin in the world?"}
166
+ ]
167
 
168
+ # Define tools available for the model to use
169
+ # Type hints and docstrings from Python functions are automatically extracted
170
+ def internet_search(query: str):
171
+ """
172
+ Returns a list of relevant document snippets for a textual query retrieved from the internet
173
 
174
+ Args:
175
+ query: Query to search the internet with
176
+ """
177
+ pass
178
 
179
+ def directly_answer():
180
+ """
181
+ Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
182
+ """
183
+ pass
184
 
185
+ tools = [internet_search, directly_answer]
186
 
187
+ # render the tool use prompt as a string:
188
+ tool_use_prompt = tokenizer.apply_chat_template(
189
+ conversation,
190
+ tools=tools,
191
+ tokenize=False,
192
+ add_generation_prompt=True,
193
+ )
194
+ print(tool_use_prompt)
195
+ ```
196
+
197
+ </details>
198
+
199
+ <details>
200
+ <summary><b>Example Rendered Tool Use Prompt [CLICK TO EXPAND]</b></summary>
201
+
202
+ ````
203
+ <BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
204
+ The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.
205
+
206
+ # System Preamble
207
+ ## Basic Rules
208
+ You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
209
+
210
+ # User Preamble
211
+ ## Task and Context
212
+ You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
213
+
214
+ ## Style Guide
215
+ Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
216
+
217
+ ## Available Tools
218
+ Here is a list of tools that you have available to you:
219
+
220
+ ```python
221
+ def internet_search(query: str) -> List[Dict]:
222
+ """Returns a list of relevant document snippets for a textual query retrieved from the internet
223
+
224
+ Args:
225
+ query (str): Query to search the internet with
226
+ """
227
+ pass
228
+ ```
229
+
230
+ ```python
231
+ def directly_answer() -> List[Dict]:
232
+ """Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
233
+ """
234
+ pass
235
+ ```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the `directly-answer` tool if calling the other tools is unnecessary. The list of actions you want to call should be formatted as a list of json objects, for example:
236
+ ```json
237
+ [
238
+ {
239
+ "tool_name": title of the tool in the specification,
240
+ "parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters
241
+ }
242
+ ]```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
243
+ ````
244
+
245
+ </details>
246
+
247
+
248
+ <details>
249
+ <summary><b>Example Rendered Tool Use Completion [CLICK TO EXPAND]</b></summary>
250
 
251
+ ````
252
+ Action: ```json
253
+ [
254
+ {
255
+ "tool_name": "internet_search",
256
+ "parameters": {
257
+ "query": "biggest penguin in the world"
258
+ }
259
+ }
260
+ ]
261
+ ```
262
+ ````
263
+ </details>
264
+
265
+
266
+ ### Grounded Generation and RAG Capabilities:
267
+
268
+ Command R+ 08-2024 has been specifically trained with grounded generation capabilities. This means that it can generate responses based on a list of supplied document snippets, and it will include grounding spans (citations) in its response indicating the source of the information. This can be used to enable behaviors such as grounded summarization and the final step of Retrieval Augmented Generation (RAG). This behavior has been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template may reduce performance, but we encourage experimentation.
269
+
270
+ Command R+ 08-2024’s grounded generation behavior takes a conversation as input (with an optional user-supplied system preamble, indicating task, context and desired output style), along with a list of retrieved document snippets. The document snippets should be chunks, rather than long documents, typically around 100-400 words per chunk. Document snippets consist of key-value pairs. The keys should be short descriptive strings, the values can be text or semi-structured.
271
 
272
+ By default, Command R+ 08-2024 will generate grounded responses by first predicting which documents are relevant, then predicting which ones it will cite, then generating an answer. Finally, it will then insert grounding spans into the answer. See below for an example. This is referred to as `accurate` grounded generation.
273
 
274
+ The model is trained with a number of other answering modes, which can be selected by prompt changes. A `fast` citation mode is supported in the tokenizer, which will directly generate an answer with grounding spans in it, without first writing the answer out in full. This sacrifices some grounding accuracy in favor of generating fewer tokens.
275
 
276
+ Comprehensive documentation for working with Command R+ 08-2024's grounded generation prompt template can be found [here](https://docs.cohere.com/docs/prompting-command-r).
277
 
278
+ The code snippet below shows a minimal working example on how to render a prompt.
 
 
 
 
279
 
280
+ <details>
281
+ <summary> <b>Usage: Rendering Grounded Generation prompts [CLICK TO EXPAND]</b> </summary>
282
 
283
+ ````python
284
+ from transformers import AutoTokenizer
285
 
286
+ model_id = "CohereForAI/c4ai-command-r-plus-08-2024"
287
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
288
 
289
+ # define conversation input:
290
+ conversation = [
291
+ {"role": "user", "content": "Whats the biggest penguin in the world?"}
292
+ ]
293
+ # define documents to ground on:
294
+ documents = [
295
+ { "title": "Tall penguins", "text": "Emperor penguins are the tallest growing up to 122 cm in height." },
296
+ { "title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."}
297
+ ]
298
 
299
+ # render the tool use prompt as a string:
300
+ grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
301
+ conversation,
302
+ documents=documents,
303
+ citation_mode="accurate", # or "fast"
304
+ tokenize=False,
305
+ add_generation_prompt=True,
306
+ )
307
+ print(grounded_generation_prompt)
308
+ ````
309
 
310
+ </details>
311
 
312
+ <details>
313
+ <summary><b>Example Rendered Grounded Generation Prompt [CLICK TO EXPAND]</b></summary>
314
 
315
+ ````
316
+ <BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
317
+ The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.
318
 
319
+ # System Preamble
320
+ ## Basic Rules
321
+ You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
322
 
323
+ # User Preamble
324
+ ## Task and Context
325
+ You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
326
 
327
+ ## Style Guide
328
+ Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><results>
329
+ Document: 0
330
+ title: Tall penguins
331
+ text: Emperor penguins are the tallest growing up to 122 cm in height.
332
 
333
+ Document: 1
334
+ title: Penguin habitats
335
+ text: Emperor penguins only live in Antarctica.
336
+ </results><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Carefully perform the following instructions, in order, starting each with a new line.
337
+ Firstly, Decide which of the retrieved documents are relevant to the user's last input by writing 'Relevant Documents:' followed by comma-separated list of document numbers. If none are relevant, you should instead write 'None'.
338
+ Secondly, Decide which of the retrieved documents contain facts that should be cited in a good answer to the user's last input by writing 'Cited Documents:' followed a comma-separated list of document numbers. If you dont want to cite any of them, you should instead write 'None'.
339
+ Thirdly, Write 'Answer:' followed by a response to the user's last input in high quality natural english. Use the retrieved documents to help you. Do not insert any citations or grounding markup.
340
+ Finally, Write 'Grounded answer:' followed by a response to the user's last input in high quality natural english. Use the symbols <co: doc> and </co: doc> to indicate when a fact comes from a document in the search result, e.g <co: 0>my fact</co: 0> for a fact from document 0.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
341
+ ````
342
 
343
+ </details>
344
 
 
345
 
 
346
 
347
+ <details>
348
+ <summary><b>Example Rendered Grounded Generation Completion [CLICK TO EXPAND]</b></summary>
349
 
350
+ ````
351
+ Relevant Documents: 0,1
352
+ Cited Documents: 0,1
353
+ Answer: The Emperor Penguin is the tallest or biggest penguin in the world. It is a bird that lives only in Antarctica and grows to a height of around 122 centimetres.
354
+ Grounded answer: The <co: 0>Emperor Penguin</co: 0> is the <co: 0>tallest</co: 0> or biggest penguin in the world. It is a bird that <co: 1>lives only in Antarctica</co: 1> and <co: 0>grows to a height of around 122 centimetres.</co: 0>
355
+ ````
356
 
357
+ </details>
358
 
 
359
 
360
+ ### Code Capabilities:
361
+ Command R+ 08-2024 has been optimized to interact with your code, by requesting code snippets, code explanations, or code rewrites. It might not perform well out-of-the-box for pure code completion. For better performance, we also recommend using a low temperature (and even greedy decoding) for code-generation related instructions.
362
 
363
+ ### Model Card Contact
364
+ For errors or additional questions about details in this model card, contact [info@for.ai](mailto:info@for.ai).
365
 
366
+ ### Terms of Use:
367
+ We hope that the release of this model will make community-based research efforts more accessible, by releasing the weights of a highly performant 104 billion parameter model to researchers all over the world. This model is governed by a [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license) License with an acceptable use addendum, and also requires adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
368
 
369
+ ### Try Chat:
370
+ You can try Command R+ 08-2024 chat in the playground [here](https://dashboard.cohere.com/playground/chat). You can also use it in our dedicated Hugging Face Space [here](https://huggingface.co/spaces/CohereForAI/c4ai-command?model=command-r-plus-08-2024).
371