Files changed (1) hide show
  1. README.md +84 -75
README.md CHANGED
@@ -4,36 +4,26 @@ language:
4
  tags:
5
  - falcon3
6
  - falcon3_mamba
7
- - falcon_mamba
8
  base_model:
9
  - tiiuae/Falcon3-Mamba-7B-Base
10
- license: other
11
- license_name: falcon-llm-license
12
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
- library_name: transformers
14
  ---
15
 
16
- <div align="center">
17
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/falcon_mamba/falcon-mamba-logo.png" alt="drawing" width="500"/>
18
- </div>
19
-
20
-
21
  # Falcon3-Mamba-7B-Instruct
22
 
23
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
24
 
25
- This repository contains the **Falcon3-Mamba-7B-Instruct**. It achieves, compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
26
- Falcon3-Mamba-7B-Instruct supports a context length up to 32K and was mainly trained on english corpus.
27
 
28
  ## Model Details
29
- - Architecture (same as [Falcon-Mamba-7b](https://huggingface.co/tiiuae/falcon-mamba-7b))
30
  - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token).
31
  - 64 decoder blocks
32
  - width: 4096
33
  - state_size: 16
34
  - 32k context length
35
  - 65k vocab size
36
- - Continue Pretrained from [Falcon-Mamba-7b](https://arxiv.org/abs/2410.05355), with another 1500 Gigatokens of data consisting of web, code, STEM and high quality data.
37
  - Postrained on 1.2 million samples of STEM, conversations, code, and safety.
38
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
39
  - License: TII Falcon-LLM License 2.0
@@ -89,7 +79,7 @@ print(response)
89
  <br>
90
 
91
  # Benchmarks
92
- We report in the following table our internal pipeline benchmarks. For the benchmarks marked by star, we normalize the results with HuggingFace score normalization:
93
 
94
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
95
  <colgroup>
@@ -98,6 +88,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
98
  <col style="width: 7%;">
99
  <col style="width: 7%;">
100
  <col style="width: 7%;">
 
101
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
102
  </colgroup>
103
  <thead>
@@ -106,6 +97,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
106
  <th>Benchmark</th>
107
  <th>Zamba2-7B-instruct</th>
108
  <th>Jamba-1.5-Mini</th>
 
109
  <th>Llama-3.1-8B-Instruct</th>
110
  <th>Falcon3-Mamba-7B-Instruct</th>
111
  </tr>
@@ -114,105 +106,122 @@ We report in the following table our internal pipeline benchmarks. For the bench
114
  <tr>
115
  <td rowspan="3">General</td>
116
  <td>MMLU (5-shot)</td>
117
- <td>30.6</td>
118
- <td>68.7</td>
119
- <td>55.9</td>
120
- <td>65.3</td>
 
121
  </tr>
122
  <tr>
123
- <td>MMLU-PRO (5-shot)*</td>
124
- <td>32.4</td>
125
- <td>31.6</td>
126
- <td>21.8</td>
127
- <td>26.3</td>
 
128
  </tr>
129
  <tr>
130
  <td>IFEval</td>
131
- <td>69.9</td>
132
- <td>65.7</td>
133
- <td>78.8</td>
134
- <td>71.7</td>
 
135
  </tr>
136
  <tr>
137
  <td rowspan="2">Math</td>
138
  <td>GSM8K (5-shot)</td>
139
- <td>0</td>
140
- <td>74.9</td>
141
- <td>19.2</td>
142
- <td>65.2</td>
 
143
  </tr>
144
  <tr>
145
- <td>MATH Lvl-5 (4-shot)</td>
146
- <td>13.6</td>
147
- <td>6.9</td>
148
- <td>10.4</td>
149
- <td>27.3</td>
 
150
  </tr>
151
  <tr>
152
  <td rowspan="4">Reasoning</td>
153
  <td>Arc Challenge (25-shot)</td>
154
- <td>54</td>
155
- <td>54.3</td>
156
- <td>46.6</td>
157
- <td>53.7</td>
 
158
  </tr>
159
  <tr>
160
- <td>GPQA (0-shot)*</td>
161
- <td>10.3</td>
162
- <td>11.1</td>
163
- <td>6.2</td>
164
- <td>7.2</td>
 
165
  </tr>
166
  <tr>
167
- <td>MUSR (0-shot)*</td>
168
- <td>8.2</td>
169
- <td>12.2</td>
170
- <td>38.6</td>
171
- <td>8.3</td>
 
172
  </tr>
173
  <tr>
174
- <td>BBH (3-shot)*</td>
175
- <td>33.3</td>
176
- <td>35.3</td>
177
- <td>43.7</td>
178
- <td>25.2</td>
 
179
  </tr>
180
  <tr>
181
  <td rowspan="4">CommonSense Understanding</td>
182
  <td>PIQA (0-shot)</td>
183
- <td>75.6</td>
184
- <td>82.3</td>
185
- <td>78.9</td>
186
- <td>80.9</td>
 
187
  </tr>
188
  <tr>
189
  <td>SciQ (0-shot)</td>
190
- <td>29.2</td>
191
- <td>94.9</td>
192
- <td>80.2</td>
193
- <td>93.6</td>
 
 
 
 
 
 
 
 
 
194
  </tr>
195
  <tr>
196
  <td>OpenbookQA (0-shot)</td>
197
- <td>45.6</td>
198
- <td>45.8</td>
199
- <td>46.2</td>
200
- <td>47.2</td>
 
201
  </tr>
202
  </tbody>
203
  </table>
204
 
205
- ## Useful links
206
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
207
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
208
 
209
- ## Citation
210
- If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
211
 
212
  ```
213
  @misc{Falcon3,
214
- title = {The Falcon 3 Family of Open Models},
215
- author = {Falcon-LLM Team},
216
  month = {December},
217
  year = {2024}
218
  }
 
4
  tags:
5
  - falcon3
6
  - falcon3_mamba
 
7
  base_model:
8
  - tiiuae/Falcon3-Mamba-7B-Base
 
 
 
 
9
  ---
10
 
 
 
 
 
 
11
  # Falcon3-Mamba-7B-Instruct
12
 
13
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
14
 
15
+ This repository contains the **Falcon3-Mamba-7B-Instruct**. It achieves ,compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
16
+ Falcon3-Mamba-7B-Instruct supports a context length up to 32K and 1 language (english).
17
 
18
  ## Model Details
19
+ - Architecture(same as Falcon-Mamba-7b)
20
  - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token).
21
  - 64 decoder blocks
22
  - width: 4096
23
  - state_size: 16
24
  - 32k context length
25
  - 65k vocab size
26
+ - Pretrained on 7 Teratokens of datasets comprising of web, code, STEM and high quality data using 2048 H100 GPU chips
27
  - Postrained on 1.2 million samples of STEM, conversations, code, and safety.
28
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
29
  - License: TII Falcon-LLM License 2.0
 
79
  <br>
80
 
81
  # Benchmarks
82
+ We report in the following table our internal pipeline benchmarks:
83
 
84
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
85
  <colgroup>
 
88
  <col style="width: 7%;">
89
  <col style="width: 7%;">
90
  <col style="width: 7%;">
91
+ <col style="width: 7%;">
92
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
93
  </colgroup>
94
  <thead>
 
97
  <th>Benchmark</th>
98
  <th>Zamba2-7B-instruct</th>
99
  <th>Jamba-1.5-Mini</th>
100
+ <th>Qwen2-7B-Instruct</th>
101
  <th>Llama-3.1-8B-Instruct</th>
102
  <th>Falcon3-Mamba-7B-Instruct</th>
103
  </tr>
 
106
  <tr>
107
  <td rowspan="3">General</td>
108
  <td>MMLU (5-shot)</td>
109
+ <td>-</td>
110
+ <td>68.7%</td>
111
+ <td>-</td>
112
+ <td>68.5%</td>
113
+ <td>-</td>
114
  </tr>
115
  <tr>
116
+ <td>MMLU-PRO (5-shot)</td>
117
+ <td>32.4%</td>
118
+ <td>31.6%</td>
119
+ <td>31.6%</td>
120
+ <td>29.6%</td>
121
+ <td>26.3%</td>
122
  </tr>
123
  <tr>
124
  <td>IFEval</td>
125
+ <td>69.9%</td>
126
+ <td>65.7%</td>
127
+ <td>56.8%</td>
128
+ <td>78.6%</td>
129
+ <td>71.7%</td>
130
  </tr>
131
  <tr>
132
  <td rowspan="2">Math</td>
133
  <td>GSM8K (5-shot)</td>
134
+ <td>-</td>
135
+ <td>74.9%</td>
136
+ <td>-</td>
137
+ <td>-</td>
138
+ <td>-</td>
139
  </tr>
140
  <tr>
141
+ <td>MATH(4-shot)</td>
142
+ <td>-</td>
143
+ <td>6.9%</td>
144
+ <td>9.44%</td>
145
+ <td>-</td>
146
+ <td>27.3%</td>
147
  </tr>
148
  <tr>
149
  <td rowspan="4">Reasoning</td>
150
  <td>Arc Challenge (25-shot)</td>
151
+ <td>-</td>
152
+ <td>54.3%</td>
153
+ <td>-</td>
154
+ <td>-</td>
155
+ <td>-</td>
156
  </tr>
157
  <tr>
158
+ <td>GPQA (0-shot)</td>
159
+ <td>10.3%</td>
160
+ <td>11.1%</td>
161
+ <td>6.4%</td>
162
+ <td>2.4%</td>
163
+ <td>7.2%</td>
164
  </tr>
165
  <tr>
166
+ <td>MUSR (0-shot)</td>
167
+ <td>8.2%</td>
168
+ <td>12.2%</td>
169
+ <td>7.4%</td>
170
+ <td>8.4%</td>
171
+ <td>8.3%</td>
172
  </tr>
173
  <tr>
174
+ <td>BBH (3-shot)</td>
175
+ <td>33.3%</td>
176
+ <td>35.3%</td>
177
+ <td>37.8%</td>
178
+ <td>29.9%</td>
179
+ <td>25.2%</td>
180
  </tr>
181
  <tr>
182
  <td rowspan="4">CommonSense Understanding</td>
183
  <td>PIQA (0-shot)</td>
184
+ <td>-</td>
185
+ <td>82.3%</td>
186
+ <td>-</td>
187
+ <td>-</td>
188
+ <td>-</td>
189
  </tr>
190
  <tr>
191
  <td>SciQ (0-shot)</td>
192
+ <td>-</td>
193
+ <td>94.9%</td>
194
+ <td>-</td>
195
+ <td>-</td>
196
+ <td>-</td>
197
+ </tr>
198
+ <tr>
199
+ <td>Winogrande (0-shot)</td>
200
+ <td>-</td>
201
+ <td>64.5%</td>
202
+ <td>-</td>
203
+ <td>-</td>
204
+ <td>-</td>
205
  </tr>
206
  <tr>
207
  <td>OpenbookQA (0-shot)</td>
208
+ <td>-</td>
209
+ <td>34.6%</td>
210
+ <td>-</td>
211
+ <td>-</td>
212
+ <td>-</td>
213
  </tr>
214
  </tbody>
215
  </table>
216
 
 
 
 
217
 
218
+ # Citation
219
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
220
 
221
  ```
222
  @misc{Falcon3,
223
+ title = {The Falcon 3 family of Open Models},
224
+ author = {TII Team},
225
  month = {December},
226
  year = {2024}
227
  }