Upload tokenizer_config.json

#1
by fedyanin - opened
README.md CHANGED
@@ -1,305 +1,3 @@
1
- ---
2
- language:
3
- - en
4
- - fr
5
- - es
6
- - pt
7
- tags:
8
- - falcon3
9
- base_model: tiiuae/Falcon3-7B-Base
10
- license: other
11
- license_name: falcon-llm-license
12
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
- library_name: transformers
14
- ---
15
-
16
- <div align="center">
17
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
18
- </div>
19
-
20
- # Falcon3-7B-Instruct
21
-
22
- **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
23
-
24
- This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks.
25
- Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
26
-
27
- ## Model Details
28
- - Architecture
29
- - Transformer based causal decoder only architecture
30
- - 28 decoder blocks
31
- - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
32
- - Wider head dimension: 256
33
- - High RoPE value to support long context understanding: 1000042
34
- - Uses SwiGLU and RMSNorm
35
- - 32K context length
36
- - 131K vocab size
37
- - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips
38
- - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
39
- - Supports EN, FR, ES, PT
40
- - Developed by [Technology Innovation Institute](https://www.tii.ae)
41
- - License: TII Falcon-LLM License 2.0
42
- - Model Release Date: December 2024
43
-
44
-
45
- ## Getting started
46
-
47
- <details>
48
- <summary> Click to expand </summary>
49
-
50
- ```python
51
-
52
- from transformers import AutoModelForCausalLM, AutoTokenizer
53
-
54
- model_name = "tiiuae/Falcon3-7B-Instruct"
55
-
56
- model = AutoModelForCausalLM.from_pretrained(
57
- model_name,
58
- torch_dtype="auto",
59
- device_map="auto"]
60
- )
61
- tokenizer = AutoTokenizer.from_pretrained(model_name)
62
-
63
- prompt = "How many hours in one day?"
64
- messages = [
65
- {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
66
- {"role": "user", "content": prompt}
67
- ]
68
- text = tokenizer.apply_chat_template(
69
- messages,
70
- tokenize=False,
71
- add_generation_prompt=True
72
- )
73
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
74
-
75
- generated_ids = model.generate(
76
- **model_inputs,
77
- max_new_tokens=1024
78
- )
79
- generated_ids = [
80
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
81
- ]
82
-
83
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
84
- print(response)
85
- ```
86
-
87
- </details>
88
-
89
- <br>
90
-
91
- ## Benchmarks
92
- We report the official HuggingFace leaderboard normalized evaluations [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) in the following table.
93
- <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
94
- <colgroup>
95
- <col style="width: 10%;">
96
- <col style="width: 7%;">
97
- <col style="width: 7%;">
98
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
99
- </colgroup>
100
- <thead>
101
- <tr>
102
- <th>Benchmark</th>
103
- <th>Llama-3.1-8B-Instruct</th>
104
- <th>Qwen2.5-7B-Instruct</th>
105
- <th>Falcon3-7B-Instruct</th>
106
- </tr>
107
- </thead>
108
- <tbody>
109
- <tr>
110
- <td>IFEval</td>
111
- <td><b>78.56</b></td>
112
- <td>75.85</td>
113
- <td>76.12</td>
114
- </tr>
115
- <tr>
116
- <td>BBH (3-shot)</td>
117
- <td>29.89</td>
118
- <td>34.89</td>
119
- <td><b>37.92</b></td>
120
- </tr>
121
- <tr>
122
- <td>MATH Lvl-5 (4-shot)</td>
123
- <td>19.34</td>
124
- <td>0.00</td>
125
- <td><b>31.87</b></td>
126
- </tr>
127
- <tr>
128
- <td>GPQA (0-shot)</td>
129
- <td>2.35</td>
130
- <td>5.48</td>
131
- <td><b>8.05</b></td>
132
- </tr>
133
- <tr>
134
- <td>MUSR (0-shot)</td>
135
- <td>8.41</td>
136
- <td>8.45</td>
137
- <td><b>21.17</b></td>
138
- </tr>
139
- <tr>
140
- <td>MMLU-PRO (5-shot)</td>
141
- <td>30.68</td>
142
- <td><b>36.52</b></td>
143
- <td>34.30</td>
144
- </tr>
145
- </tbody>
146
- </table>
147
-
148
- Also, we report in the following table our internal pipeline benchmarks.
149
- - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
150
- - We report **raw scores** obtained by applying chat template and fewshot_as_multiturn.
151
- - We use same batch-size across all models.
152
-
153
- <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
154
- <colgroup>
155
- <col style="width: 10%;">
156
- <col style="width: 10%;">
157
- <col style="width: 7%;">
158
- <col style="width: 7%;">
159
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
160
- </colgroup>
161
- <thead>
162
- <tr>
163
- <th>Category</th>
164
- <th>Benchmark</th>
165
- <th>Llama-3.1-8B-Instruct</th>
166
- <th>Qwen2.5-7B-Instruct</th>
167
- <th>Falcon3-7B-Instruct</th>
168
- </tr>
169
- </thead>
170
- <tbody>
171
- <tr>
172
- <td rowspan="3">General</td>
173
- <td>MMLU (5-shot)</td>
174
- <td>68.2</td>
175
- <td><b>73.5</b></td>
176
- <td>70.5</td>
177
- </tr>
178
- <tr>
179
- <td>MMLU-PRO (5-shot)</td>
180
- <td>36.4</td>
181
- <td><b>43.1</b></td>
182
- <td>40.7</td>
183
- </tr>
184
- <tr>
185
- <td>IFEval</td>
186
- <td><b>78.8</b></td>
187
- <td>74.7</td>
188
- <td>76.5</td>
189
- </tr>
190
- <tr>
191
- <td rowspan="3">Math</td>
192
- <td>GSM8K (5-shot)</td>
193
- <td><b>82.6</b></td>
194
- <td>72.0</td>
195
- <td>81.4</td>
196
- </tr>
197
- <tr>
198
- <td>GSM8K (8-shot, COT)</td>
199
- <td><b>85.4</b></td>
200
- <td>76.6</td>
201
- <td>79.7</td>
202
- </tr>
203
- <tr>
204
- <td>MATH Lvl-5 (4-shot)</td>
205
- <td>15.4</td>
206
- <td>-</td>
207
- <td><b>29.4</b></td>
208
- </tr>
209
- <tr>
210
- <td rowspan="5">Reasoning</td>
211
- <td>Arc Challenge (25-shot)</td>
212
- <td>58.6</td>
213
- <td>57.8</td>
214
- <td><b>62.6</b></td>
215
- </tr>
216
- <tr>
217
- <td>GPQA (0-shot)</td>
218
- <td><b>33.5</b></td>
219
- <td>32</td>
220
- <td>31.9</td>
221
- </tr>
222
- <tr>
223
- <td>GPQA (0-shot, COT)</td>
224
- <td>9.6</td>
225
- <td>13.8</td>
226
- <td><b>22.3</b></td>
227
- </tr>
228
- <tr>
229
- <td>MUSR (0-shot)</td>
230
- <td>38.6</td>
231
- <td>41</td>
232
- <td><b>46.4</b></td>
233
- </tr>
234
- <tr>
235
- <td>BBH (3-shot)</td>
236
- <td>48.6</td>
237
- <td><b>54.1</b></td>
238
- <td>52.4</td>
239
- </tr>
240
- <tr>
241
- <td rowspan="4">CommonSense Understanding</td>
242
- <td>PIQA (0-shot)</td>
243
- <td><b>78.9</b></td>
244
- <td>73.7</td>
245
- <td>78.8</td>
246
- </tr>
247
- <tr>
248
- <td>SciQ (0-shot)</td>
249
- <td>80.2</td>
250
- <td>50.9</td>
251
- <td><b>94.7</b></td>
252
- </tr>
253
- <tr>
254
- <td>Winogrande (0-shot)</td>
255
- <td>-</td>
256
- <td>-</td>
257
- <td>70.4</td>
258
- </tr>
259
- <tr>
260
- <td>OpenbookQA (0-shot)</td>
261
- <td><b>46.2</b></td>
262
- <td>42.4</td>
263
- <td>45.8</td>
264
- </tr>
265
- <tr>
266
- <td rowspan="2">Instructions following</td>
267
- <td>MT-Bench (avg)</td>
268
- <td>7.9</td>
269
- <td><b>8.5</b></td>
270
- <td>8.4</td>
271
- </tr>
272
- <tr>
273
- <td>Alpaca (WC)</td>
274
- <td>26.6</td>
275
- <td><b>31.5</b></td>
276
- <td>26.1</td>
277
- </tr>
278
- <tr>
279
- <td>Tool use</td>
280
- <td>BFCL AST (avg)</td>
281
- <td>90.6</td>
282
- <td><b>91.4</b></td>
283
- <td>89.5</td>
284
- </tr>
285
- </tbody>
286
- </table>
287
-
288
- ## Useful links
289
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
290
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
291
-
292
- ## Technical Report
293
- Coming soon....
294
-
295
- ## Citation
296
- If Falcon3 family were helpful to your work, feel free to give us a cite.
297
-
298
- ```
299
- @misc{Falcon3,
300
- title = {The Falcon 3 family of Open Models},
301
- author = {TII Team},
302
- month = {December},
303
- year = {2024}
304
- }
305
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "architectures": [
3
  "LlamaForCausalLM"
4
  ],
@@ -9,6 +10,7 @@
9
  "head_dim": 256,
10
  "hidden_act": "silu",
11
  "hidden_size": 3072,
 
12
  "intermediate_size": 23040,
13
  "max_position_embeddings": 32768,
14
  "mlp_bias": false,
@@ -24,5 +26,5 @@
24
  "torch_dtype": "bfloat16",
25
  "transformers_version": "4.46.1",
26
  "use_cache": true,
27
- "vocab_size": 131072
28
  }
 
1
  {
2
+ "_name_or_path": "Iheb-Chaabane/falcon3-7b-explore-dpo-bs-64",
3
  "architectures": [
4
  "LlamaForCausalLM"
5
  ],
 
10
  "head_dim": 256,
11
  "hidden_act": "silu",
12
  "hidden_size": 3072,
13
+ "initializer_range": 0.02,
14
  "intermediate_size": 23040,
15
  "max_position_embeddings": 32768,
16
  "mlp_bias": false,
 
26
  "torch_dtype": "bfloat16",
27
  "transformers_version": "4.46.1",
28
  "use_cache": true,
29
+ "vocab_size": 131080
30
  }
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:07e279c8c6075600e5dc795364efff8897de0f0c22a1d2d8db79a70adf8edb3f
3
- size 4938900432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ada95243d59a5b8a5d60ea7bec7907ca20b92bb124d5054b51792fe059b72195
3
+ size 4938949584
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5d6600f34e9972eed3201425ba75c2d58f574655f373ea8b86ddfa37d391f2a
3
  size 4942085160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0f167f4dc5fb028251a03f67ce36bef07a163084fbd8f7d63ca043d770ab9ca
3
  size 4942085160
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a96480584a0b5bd09c556e53d952146008bb423e5e12ea9bbd0b60d62f9a2f72
3
  size 4224838512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11a6edf04d6b4ab1044d88107eb8a4c71d6378c7d232c3c668870ceae1d2a80c
3
  size 4224838512
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0b84ea911989e21ebf4ac05018171f73016d8ae72b7904e89289be0b4672a403
3
- size 805306496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4be476e54f6ce54be4690cb9b7241959fd2096ab9a4b97648679e1fce43c575b
3
+ size 805355648
model.safetensors.index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "metadata": {
3
- "total_size": 14911113216
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00004-of-00004.safetensors",
 
1
  {
2
  "metadata": {
3
+ "total_size": 14911199232
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00004-of-00004.safetensors",
special_tokens_map.json CHANGED
@@ -32,7 +32,7 @@
32
  "single_word": false
33
  },
34
  "pad_token": {
35
- "content": "<|pad|>",
36
  "lstrip": false,
37
  "normalized": false,
38
  "rstrip": false,
 
32
  "single_word": false
33
  },
34
  "pad_token": {
35
+ "content": "<pad>",
36
  "lstrip": false,
37
  "normalized": false,
38
  "rstrip": false,
tokenizer.json CHANGED
@@ -18212,7 +18212,16 @@
18212
  },
18213
  {
18214
  "id": 2023,
18215
- "content": "<|pad|>",
 
 
 
 
 
 
 
 
 
18216
  "single_word": false,
18217
  "lstrip": false,
18218
  "rstrip": false,
@@ -20280,7 +20289,7 @@
20280
  ">>UNUSED_1894<<": 2020,
20281
  ">>UNUSED_1895<<": 2021,
20282
  ">>UNUSED_1896<<": 2022,
20283
- "<|pad|>": 2023,
20284
  "!": 2024,
20285
  "\"": 2025,
20286
  "#": 2026,
 
18212
  },
18213
  {
18214
  "id": 2023,
18215
+ "content": ">>UNUSED_1897<<",
18216
+ "single_word": false,
18217
+ "lstrip": false,
18218
+ "rstrip": false,
18219
+ "normalized": false,
18220
+ "special": true
18221
+ },
18222
+ {
18223
+ "id": 131072,
18224
+ "content": "<pad>",
18225
  "single_word": false,
18226
  "lstrip": false,
18227
  "rstrip": false,
 
20289
  ">>UNUSED_1894<<": 2020,
20290
  ">>UNUSED_1895<<": 2021,
20291
  ">>UNUSED_1896<<": 2022,
20292
+ ">>UNUSED_1897<<": 2023,
20293
  "!": 2024,
20294
  "\"": 2025,
20295
  "#": 2026,
tokenizer_config.json CHANGED
@@ -16186,7 +16186,15 @@
16186
  "special": true
16187
  },
16188
  "2023": {
16189
- "content": "<|pad|>",
 
 
 
 
 
 
 
 
16190
  "lstrip": false,
16191
  "normalized": false,
16192
  "rstrip": false,
@@ -16219,15 +16227,11 @@
16219
  ">>PASSWORD<<",
16220
  ">>KEY<<"
16221
  ],
16222
- "chat_template": "{%- if tools %}\n{{- '<|system|>\\n' }}\n{%- if messages[0]['role'] == 'system' %}\n{{- messages[0]['content'] }}\n{%- set remaining_messages = messages[1:] %}\n{%- else %}\n{%- set remaining_messages = messages %}\n{%- endif %}\n{{- 'You are a Falcon assistant skilled in function calling. You are helpful, respectful, and concise.\\n\\n# Tools\\n\\nYou have access to the following functions. You MUST use them to answer questions when needed. For each function call, you MUST return a JSON object inside <tool_call></tool_call> tags.\\n\\n<tools>' + tools|tojson(indent=2) + '</tools>\\n\\n# Output Format\\n\\nYour response MUST follow this format when making function calls:\\n<tool_call>\\n[\\n {\"name\": \"function_name\", \"arguments\": {\"arg1\": \"value1\", \"arg2\": \"value2\"}},\\n {\"name\": \"another_function\", \"arguments\": {\"arg\": \"value\"}}\\n]\\n</tool_call>\\nIf no function calls are needed, respond normally without the tool_call tags.\\n' }}\n{%- for message in remaining_messages %}\n{%- if message['role'] == 'user' %}\n{{- '<|user|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'assistant' %}\n{%- if message.content %}\n{{- '<|assistant|>\\n' + message['content'] }}\n{%- endif %}\n{%- if message.tool_calls %}\n{{- '\\n<tool_call>\\n' }}\n{{- message.tool_calls|tojson(indent=2) }}\n{{- '\\n</tool_call>' }}\n{%- endif %}\n{{- eos_token + '\\n' }}\n{%- elif message['role'] == 'tool' %}\n{{- '<|assistant|>\\n<tool_response>\\n' + message['content'] + '\\n</tool_response>\\n' }}\n{%- endif %}\n{%- endfor %}\n{{- '<|assistant|>\\n' if add_generation_prompt }}\n{%- else %}\n{%- for message in messages %}\n{%- if message['role'] == 'system' %}\n{{- '<|system|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'user' %}\n{{- '<|user|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'assistant' %}\n{%- if not loop.last %}\n{{- '<|assistant|>\\n' + message['content'] + eos_token + '\\n' }}\n{%- else %}\n{{- '<|assistant|>\\n' + message['content'] + eos_token }}\n{%- endif %}\n{%- endif %}\n{%- if loop.last and add_generation_prompt %}\n{{- '<|assistant|>\\n' }}\n{%- endif %}\n{%- endfor %}\n{%- endif %}",
16223
  "clean_up_tokenization_spaces": true,
16224
  "eos_token": "<|endoftext|>",
16225
- "extra_special_tokens": {},
16226
- "model_input_names": [
16227
- "input_ids",
16228
- "attention_mask"
16229
- ],
16230
  "model_max_length": 32768,
16231
- "pad_token": "<|pad|>",
16232
  "tokenizer_class": "PreTrainedTokenizerFast"
16233
  }
 
 
16186
  "special": true
16187
  },
16188
  "2023": {
16189
+ "content": ">>UNUSED_1897<<",
16190
+ "lstrip": false,
16191
+ "normalized": false,
16192
+ "rstrip": false,
16193
+ "single_word": false,
16194
+ "special": true
16195
+ },
16196
+ "131072": {
16197
+ "content": "<pad>",
16198
  "lstrip": false,
16199
  "normalized": false,
16200
  "rstrip": false,
 
16227
  ">>PASSWORD<<",
16228
  ">>KEY<<"
16229
  ],
16230
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}",
16231
  "clean_up_tokenization_spaces": true,
16232
  "eos_token": "<|endoftext|>",
 
 
 
 
 
16233
  "model_max_length": 32768,
16234
+ "pad_token": "<pad>",
16235
  "tokenizer_class": "PreTrainedTokenizerFast"
16236
  }
16237
+