jncraton commited on
Commit
81d0b83
1 Parent(s): 9fd21fd

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - es
6
+ - pt
7
+ tags:
8
+ - falcon3
9
+ base_model: tiiuae/Falcon3-1B-Base
10
+ license: other
11
+ license_name: falcon-llm-license
12
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
+ library_name: transformers
14
+ ---
15
+
16
+ <div align="center">
17
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
18
+ </div>
19
+
20
+ # Falcon3-1B-Instruct
21
+
22
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
23
+
24
+ This repository contains the **Falcon3-1B-Instruct**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
25
+ Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
26
+
27
+ ## Model Details
28
+ - Architecture
29
+ - Transformer-based causal decoder-only architecture
30
+ - 18 decoder blocks
31
+ - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
32
+ - Wider head dimension: 256
33
+ - High RoPE value to support long context understanding: 1000042
34
+ - Uses SwiGLU and RMSNorm
35
+ - 8K context length
36
+ - 131K vocab size
37
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
38
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
39
+ - Supports EN, FR, ES, PT
40
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
41
+ - License: TII Falcon-LLM License 2.0
42
+ - Model Release Date: December 2024
43
+
44
+
45
+ ## Getting started
46
+
47
+ <details>
48
+ <summary> Click to expand </summary>
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+
53
+
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer
55
+
56
+ model_name = "tiiuae/Falcon3-1B-Instruct"
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_name,
60
+ torch_dtype="auto",
61
+ device_map="auto"
62
+ )
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
64
+
65
+ prompt = "How many hours in one day?"
66
+ messages = [
67
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
68
+ {"role": "user", "content": prompt}
69
+ ]
70
+ text = tokenizer.apply_chat_template(
71
+ messages,
72
+ tokenize=False,
73
+ add_generation_prompt=True
74
+ )
75
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
76
+
77
+ generated_ids = model.generate(
78
+ **model_inputs,
79
+ max_new_tokens=1024
80
+ )
81
+ generated_ids = [
82
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
83
+ ]
84
+
85
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
86
+ print(response)
87
+ ```
88
+
89
+ </details>
90
+
91
+ <br>
92
+
93
+ ## Benchmarks
94
+ We report in the following table our internal pipeline benchmarks.
95
+ - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
96
+ - We report **raw scores** obtained by applying chat template **without fewshot_as_multiturn** (unlike Llama3.1).
97
+ - We use same batch-size across all models.
98
+
99
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
100
+ <colgroup>
101
+ <col style="width: 10%;">
102
+ <col style="width: 10%;">
103
+ <col style="width: 7%;">
104
+ <col style="width: 7%;">
105
+ <col style="width: 7%;">
106
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
107
+ </colgroup>
108
+ <thead>
109
+ <tr>
110
+ <th>Category</th>
111
+ <th>Benchmark</th>
112
+ <th>Llama-3.2-1B</th>
113
+ <th>Qwen2.5-1.5B</th>
114
+ <th>SmolLM2-1.7B</th>
115
+ <th>Falcon3-1B-Instruct</th>
116
+ </tr>
117
+ </thead>
118
+ <tbody>
119
+ <tr>
120
+ <td rowspan="3">General</td>
121
+ <td>MMLU (5-shot)</td>
122
+ <td>23.4</td>
123
+ <td><b>58.4</b></td>
124
+ <td>48.4</td>
125
+ <td>43.9</td>
126
+ </tr>
127
+ <tr>
128
+ <td>MMLU-PRO (5-shot)</td>
129
+ <td>11.3</td>
130
+ <td><b>21.3</b></td>
131
+ <td>17.2</td>
132
+ <td>18.6</td>
133
+ </tr>
134
+ <tr>
135
+ <td>IFEval</td>
136
+ <td><b>55.8</b></td>
137
+ <td>44.4</td>
138
+ <td>53.0</td>
139
+ <td>54.4</td>
140
+ </tr>
141
+ <tr>
142
+ <td rowspan="3">Math</td>
143
+ <td>GSM8K (5-shot)</td>
144
+ <td>37.4</td>
145
+ <td><b>57.2</b></td>
146
+ <td>43.4</td>
147
+ <td>38.6</td>
148
+ </tr>
149
+ <tr>
150
+ <td>GSM8K (8-shot, COT)</td>
151
+ <td>35.6</td>
152
+ <td><b>62.2</b></td>
153
+ <td>47.2</td>
154
+ <td>41.8</td>
155
+ </tr>
156
+ <tr>
157
+ <td>MATH Lvl-5 (4-shot)</td>
158
+ <td><b>3.9</b></td>
159
+ <td>0.2</td>
160
+ <td>0.1</td>
161
+ <td>1.0</td>
162
+ </tr>
163
+ <tr>
164
+ <td rowspan="6">Reasoning</td>
165
+ <td>Arc Challenge (25-shot)</td>
166
+ <td>34.1</td>
167
+ <td>47.0</td>
168
+ <td><b>47.6</b></td>
169
+ <td>45.9</td>
170
+ </tr>
171
+ <tr>
172
+ <td>GPQA (0-shot)</td>
173
+ <td>25.3</td>
174
+ <td><b>29.6</b></td>
175
+ <td>28.7</td>
176
+ <td>26.5</td>
177
+ </tr>
178
+ <tr>
179
+ <td>GPQA (0-shot, COT)</td>
180
+ <td>13.2</td>
181
+ <td>9.2</td>
182
+ <td>16.0</td>
183
+ <td><b>21.3</b></td>
184
+ </tr>
185
+ <tr>
186
+ <td>MUSR (0-shot)</td>
187
+ <td>32.4</td>
188
+ <td>36.8</td>
189
+ <td>33.0</td>
190
+ <td><b>40.7</b></td>
191
+ </tr>
192
+ <tr>
193
+ <td>BBH (3-shot)</td>
194
+ <td>30.3</td>
195
+ <td><b>38.5</b></td>
196
+ <td>33.1</td>
197
+ <td>35.1</td>
198
+ </tr>
199
+ <tr>
200
+ <td>BBH (3-shot, COT)</td>
201
+ <td>0.0</td>
202
+ <td>20.3</td>
203
+ <td>0.8</td>
204
+ <td><b>30.5</b></td>
205
+ </tr>
206
+ <tr>
207
+ <td rowspan="5">CommonSense Understanding</td>
208
+ <td>PIQA (0-shot)</td>
209
+ <td>72.1</td>
210
+ <td>73.2</td>
211
+ <td><b>74.4</b></td>
212
+ <td>72.0</td>
213
+ </tr>
214
+ <tr>
215
+ <td>SciQ (0-shot)</td>
216
+ <td>61.8</td>
217
+ <td>69.5</td>
218
+ <td>71.4</td>
219
+ <td><b>86.8</b></td>
220
+ </tr>
221
+ <tr>
222
+ <td>Winogrande (0-shot)</td>
223
+ <td>-</td>
224
+ <td>-</td>
225
+ <td>-</td>
226
+ <td><b>60.2</b></td>
227
+ </tr>
228
+ <tr>
229
+ <td>OpenbookQA (0-shot)</td>
230
+ <td>40.2</td>
231
+ <td>40.4</td>
232
+ <td><b>42.8</b></td>
233
+ <td>40.0</td>
234
+ </tr>
235
+ <tr>
236
+ <td>MT-Bench (avg)</td>
237
+ <td>5.4</td>
238
+ <td><b>7.1</b></td>
239
+ <td>6.1</td>
240
+ <td>5.5</td>
241
+ </tr>
242
+ <tr>
243
+ <td rowspan="1">Instructions following</td>
244
+ <td>Alpaca (WC)</td>
245
+ <td><b>8.6</b></td>
246
+ <td><b>8.6</b></td>
247
+ <td>5.4</td>
248
+ <td>6.1</td>
249
+ </tr>
250
+ </tbody>
251
+ </table>
252
+
253
+ ## Useful links
254
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
255
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
256
+
257
+ ## Technical Report
258
+ Coming soon....
259
+
260
+ ## Citation
261
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
262
+
263
+ ```
264
+ @misc{Falcon3,
265
+ title = {The Falcon 3 Family of Open Models},
266
+ url = {https://huggingface.co/blog/falcon3},
267
+ author = {Falcon-LLM Team},
268
+ month = {December},
269
+ year = {2024}
270
+ }
271
+ ```
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": null,
3
+ "eos_token": "<|endoftext|>",
4
+ "layer_norm_epsilon": 1e-06,
5
+ "multi_query_attention": true,
6
+ "quantization_bits": null,
7
+ "quantization_group_size": null,
8
+ "quantization_type": 0,
9
+ "unk_token": ""
10
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 11,
4
+ "eos_token_id": 11,
5
+ "transformers_version": "4.46.1"
6
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d67705274b4969b767a4a0573d674c14410594928742baedb59a185c85cb417
3
+ size 1672319819
special_tokens_map.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>EMAIL_ADDRESS<<",
12
+ ">>IP_ADDRESS<<",
13
+ "<|startoftext|>",
14
+ ">>IP_ADDRESS_0<<",
15
+ ">>IP_ADDRESS_1<<",
16
+ ">>IP_ADDRESS_2<<",
17
+ ">>IP_ADDRESS_3<<",
18
+ ">>IP_ADDRESS_4<<",
19
+ ">>IP_ADDRESS_5<<",
20
+ ">>IP_ADDRESS_6<<",
21
+ ">>IP_ADDRESS_7<<",
22
+ ">>IP_ADDRESS_8<<",
23
+ ">>IP_ADDRESS_9<<",
24
+ ">>PASSWORD<<",
25
+ ">>KEY<<"
26
+ ],
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<|pad|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ }
41
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff