slimfrikha-tii commited on
Commit
d9e6889
·
1 Parent(s): e40d742

docs(readme): udpate template

Browse files
Files changed (1) hide show
  1. README.md +57 -104
README.md CHANGED
@@ -1,133 +1,62 @@
1
  ---
2
  language:
3
  - en
4
- - es
5
- - pt
6
  tags:
7
  - falcon3
8
  ---
9
 
 
10
 
11
- # Table of Contents
12
 
13
- 0. [TL;DR](#TL;DR)
14
- 1. [Model Details](#model-details)
15
- 2. [Usage](#usage)
16
- 3. [Training Details](#training-details)
17
- 4. [Evaluation](#evaluation)
18
-
19
-
20
- # TL;DR
21
-
22
- # Model Details
23
 
24
  ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
25
 
26
- ## Model Description
27
-
28
- - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
29
- - **Model type:** Causal decoder-only
30
- - **Architecture:** Transformer-base
31
- - **Language(s) (NLP):** Mainly English
32
- - **License:** TII Falcon-LLM License 2.0
33
-
34
- <br>
35
-
36
- # Usage
37
-
38
- Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
39
-
40
- ## Using the Pytorch model with 🤗 transformers
41
-
42
- ### Running the model on a CPU
43
-
44
- <details>
45
- <summary> Click to expand </summary>
46
-
47
- ```python
48
- from transformers import AutoTokenizer, AutoModelForCausalLM
49
-
50
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
51
- model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")
52
-
53
- input_text = "Question: How many hours in one day? Answer: "
54
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids
55
-
56
- outputs = model.generate(input_ids)
57
- print(tokenizer.decode(outputs[0]))
58
- ```
59
-
60
- </details>
61
-
62
- ### Running the model on a GPU
63
 
64
- <details>
65
- <summary> Click to expand </summary>
66
-
67
- ```python
68
- # pip install accelerate
69
- from transformers import AutoTokenizer, AutoModelForCausalLM
70
-
71
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
72
- model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")
73
 
74
- input_text = "Question: How many hours in one day? Answer: "
75
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
76
-
77
- outputs = model.generate(input_ids)
78
- print(tokenizer.decode(outputs[0]))
79
- ```
80
-
81
- </details>
82
-
83
- ### Running the model on a GPU using `torch.compile`
84
 
85
  <details>
86
  <summary> Click to expand </summary>
87
 
88
  ```python
89
  import torch
90
- from transformers import AutoTokenizer, AutoModelForCausalLM
91
-
92
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
93
- model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)
94
-
95
- model = torch.compile(model)
96
-
97
- input_text = "Question: How many hours in one day? Answer: "
98
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
99
-
100
- outputs = model.generate(input_ids)
101
- print(tokenizer.decode(outputs[0]))
102
  ```
103
 
104
  </details>
105
 
 
106
 
107
- # Training Details
108
-
109
- ## Training Data
110
-
111
- Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.
112
-
113
- ## Training Procedure
114
-
115
- Falcon3-7B is trained on 256 H100 nodes (world size 2048).
116
-
117
- ### Training Hyperparameters
118
 
119
- | **Hyperparameter** | **Value** | **Comment** |
120
- |--------------------|------------|---------------------------------------|
121
- | Precision | `bfloat16` | |
122
- | Optimizer | AdamW | |
123
- | Max learning rate | 6e-4 | Following a WSD (warmup-stable-decay) |
124
- | | | learning rate scheduler |
125
- | Weight decay | 1e-1 | |
126
- | z-loss | 1e-4 | |
127
- | Batch size | Variable | Batch size was gradually increased |
128
- | | | during the training |
129
 
130
- # Evaluation
131
 
132
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
133
  <colgroup>
@@ -136,6 +65,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
136
  <col style="width: 7%;">
137
  <col style="width: 7%;">
138
  <col style="width: 7%;">
 
139
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
140
  </colgroup>
141
  <thead>
@@ -145,6 +75,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
145
  <th>Llama3.1-8B</th>
146
  <th>Qwen2-7B</th>
147
  <th>Qwen2.5-7B</th>
 
148
  <th>Falcon3-7B-Base</th>
149
  </tr>
150
  </thead>
@@ -155,6 +86,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
155
  <td>65.2</td>
156
  <td>70.4</td>
157
  <td>74.2</td>
 
158
  <td>67.5</td>
159
  </tr>
160
  <tr>
@@ -162,6 +94,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
162
  <td>32.7</td>
163
  <td>42.1</td>
164
  <td>43.5</td>
 
165
  <td>39.2</td>
166
  </tr>
167
  <tr>
@@ -169,6 +102,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
169
  <td>12.0</td>
170
  <td>30.6</td>
171
  <td>33.9</td>
 
172
  <td>34.3</td>
173
  </tr>
174
  <tr>
@@ -177,6 +111,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
177
  <td>49.4</td>
178
  <td>77.9</td>
179
  <td>82.9</td>
 
180
  <td>76.2</td>
181
  </tr>
182
  <tr>
@@ -184,6 +119,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
184
  <td>4.1</td>
185
  <td>17.5</td>
186
  <td>15.5</td>
 
187
  <td>18.0</td>
188
  </tr>
189
  <tr>
@@ -192,6 +128,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
192
  <td>53.4</td>
193
  <td>57.4</td>
194
  <td>59.0</td>
 
195
  <td>59.6</td>
196
  </tr>
197
  <tr>
@@ -199,6 +136,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
199
  <td>31.0</td>
200
  <td>31.9</td>
201
  <td>33.0</td>
 
202
  <td>35.5</td>
203
  </tr>
204
  <tr>
@@ -206,6 +144,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
206
  <td>38.0</td>
207
  <td>44.1</td>
208
  <td>44.2</td>
 
209
  <td>47.3</td>
210
  </tr>
211
  <tr>
@@ -213,6 +152,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
213
  <td>46.5</td>
214
  <td>53.3</td>
215
  <td>54.0</td>
 
216
  <td>51.0</td>
217
  </tr>
218
  <tr>
@@ -221,6 +161,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
221
  <td>80.3</td>
222
  <td>79.8</td>
223
  <td>78.7</td>
 
224
  <td>77.7</td>
225
  </tr>
226
  <tr>
@@ -228,6 +169,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
228
  <td>96.3</td>
229
  <td>95.9</td>
230
  <td>96.6</td>
 
231
  <td>95.3</td>
232
  </tr>
233
  <tr>
@@ -235,6 +177,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
235
  <td>74.0</td>
236
  <td>72.1</td>
237
  <td>72.9</td>
 
238
  <td>71.0</td>
239
  </tr>
240
  <tr>
@@ -242,11 +185,21 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
242
  <td>33.4</td>
243
  <td>35.2</td>
244
  <td>33.6</td>
 
245
  <td>31.4</td>
246
  </tr>
247
  </tbody>
248
  </table>
249
 
250
 
 
 
251
 
252
- # Citation
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
 
4
  tags:
5
  - falcon3
6
  ---
7
 
8
+ # Falcon3-7B-Base
9
 
10
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
11
 
12
+ This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
13
+ Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
 
 
 
 
 
 
 
 
14
 
15
  ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
16
 
17
+ ## Model Details
18
+ - Architecture
19
+ - transformer based causal decoder only architecture
20
+ - 28 decoder blocks
21
+ - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
22
+ - wider head dimension: 256
23
+ - high RoPE value to support long context understanding: 1000042
24
+ - 32k context length
25
+ - 131k vocab size
26
+ - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
27
+ - Supports EN, FR, ES, PT
28
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
29
+ - License: TII Falcon-LLM License 2.0
30
+ - Model Release Date: December 2024
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
 
 
 
 
 
 
 
 
 
32
 
33
+ ## Getting started
 
 
 
 
 
 
 
 
 
34
 
35
  <details>
36
  <summary> Click to expand </summary>
37
 
38
  ```python
39
  import torch
40
+ from transformers import pipeline
41
+
42
+ pipe = pipeline(
43
+ "text-generation",
44
+ model="tiiuae/Falcon3-7B-Base",
45
+ torch_dtype=torch.bfloat16,
46
+ device_map="auto"
47
+ )
48
+ response = pipe("Question: How many hours in one day? Answer: ")
49
+ print(response[0]['generated_text'])
 
 
50
  ```
51
 
52
  </details>
53
 
54
+ <br>
55
 
56
+ # Benchmarks
57
+ We report in the following table our internal pipeline benchmarks:
 
 
 
 
 
 
 
 
 
58
 
 
 
 
 
 
 
 
 
 
 
59
 
 
60
 
61
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
62
  <colgroup>
 
65
  <col style="width: 7%;">
66
  <col style="width: 7%;">
67
  <col style="width: 7%;">
68
+ <col style="width: 7%;">
69
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
70
  </colgroup>
71
  <thead>
 
75
  <th>Llama3.1-8B</th>
76
  <th>Qwen2-7B</th>
77
  <th>Qwen2.5-7B</th>
78
+ <th>gemma-2-9b</th>
79
  <th>Falcon3-7B-Base</th>
80
  </tr>
81
  </thead>
 
86
  <td>65.2</td>
87
  <td>70.4</td>
88
  <td>74.2</td>
89
+ <td>-</td>
90
  <td>67.5</td>
91
  </tr>
92
  <tr>
 
94
  <td>32.7</td>
95
  <td>42.1</td>
96
  <td>43.5</td>
97
+ <td>-</td>
98
  <td>39.2</td>
99
  </tr>
100
  <tr>
 
102
  <td>12.0</td>
103
  <td>30.6</td>
104
  <td>33.9</td>
105
+ <td>-</td>
106
  <td>34.3</td>
107
  </tr>
108
  <tr>
 
111
  <td>49.4</td>
112
  <td>77.9</td>
113
  <td>82.9</td>
114
+ <td>-</td>
115
  <td>76.2</td>
116
  </tr>
117
  <tr>
 
119
  <td>4.1</td>
120
  <td>17.5</td>
121
  <td>15.5</td>
122
+ <td>-</td>
123
  <td>18.0</td>
124
  </tr>
125
  <tr>
 
128
  <td>53.4</td>
129
  <td>57.4</td>
130
  <td>59.0</td>
131
+ <td>-</td>
132
  <td>59.6</td>
133
  </tr>
134
  <tr>
 
136
  <td>31.0</td>
137
  <td>31.9</td>
138
  <td>33.0</td>
139
+ <td>-</td>
140
  <td>35.5</td>
141
  </tr>
142
  <tr>
 
144
  <td>38.0</td>
145
  <td>44.1</td>
146
  <td>44.2</td>
147
+ <td>-</td>
148
  <td>47.3</td>
149
  </tr>
150
  <tr>
 
152
  <td>46.5</td>
153
  <td>53.3</td>
154
  <td>54.0</td>
155
+ <td>-</td>
156
  <td>51.0</td>
157
  </tr>
158
  <tr>
 
161
  <td>80.3</td>
162
  <td>79.8</td>
163
  <td>78.7</td>
164
+ <td>-</td>
165
  <td>77.7</td>
166
  </tr>
167
  <tr>
 
169
  <td>96.3</td>
170
  <td>95.9</td>
171
  <td>96.6</td>
172
+ <td>-</td>
173
  <td>95.3</td>
174
  </tr>
175
  <tr>
 
177
  <td>74.0</td>
178
  <td>72.1</td>
179
  <td>72.9</td>
180
+ <td>-</td>
181
  <td>71.0</td>
182
  </tr>
183
  <tr>
 
185
  <td>33.4</td>
186
  <td>35.2</td>
187
  <td>33.6</td>
188
+ <td>-</td>
189
  <td>31.4</td>
190
  </tr>
191
  </tbody>
192
  </table>
193
 
194
 
195
+ # Citation
196
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
197
 
198
+ ```
199
+ @misc{Falcon3,
200
+ title = {Falcon 3 family of Open Foundation Models},
201
+ author = {TII Team},
202
+ month = {December},
203
+ year = {2024}
204
+ }
205
+ ```