puneeshkhanna
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -19,17 +19,18 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
|
|
19 |
This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
|
20 |
Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
|
21 |
|
22 |
-
⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
|
23 |
|
24 |
## Model Details
|
25 |
- Architecture
|
26 |
-
-
|
27 |
- 40 decoder blocks
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
-
|
32 |
-
-
|
|
|
33 |
- Depth-up-scaled from **Falcon3-7B-Base** with 2 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
34 |
- Supports EN, FR, ES, PT
|
35 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
@@ -60,7 +61,7 @@ print(response[0]['generated_text'])
|
|
60 |
|
61 |
<br>
|
62 |
|
63 |
-
|
64 |
We report in the following table our internal pipeline benchmarks:
|
65 |
|
66 |
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
|
@@ -181,9 +182,11 @@ We report in the following table our internal pipeline benchmarks:
|
|
181 |
</tbody>
|
182 |
</table>
|
183 |
|
|
|
184 |
|
|
|
185 |
|
186 |
-
|
187 |
If Falcon3 family were helpful to your work, feel free to give us a cite.
|
188 |
|
189 |
```
|
|
|
19 |
This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
|
20 |
Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
|
21 |
|
22 |
+
⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most usecases.**
|
23 |
|
24 |
## Model Details
|
25 |
- Architecture
|
26 |
+
- Transformer based causal decoder only architecture
|
27 |
- 40 decoder blocks
|
28 |
+
- Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
|
29 |
+
- Wider head dimension: 256
|
30 |
+
- High RoPE value to support long context understanding: 1000042
|
31 |
+
- Use SwiGLu and RMSNorm
|
32 |
+
- 32K context length
|
33 |
+
- 131K vocab size
|
34 |
- Depth-up-scaled from **Falcon3-7B-Base** with 2 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
35 |
- Supports EN, FR, ES, PT
|
36 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
|
|
61 |
|
62 |
<br>
|
63 |
|
64 |
+
## Benchmarks
|
65 |
We report in the following table our internal pipeline benchmarks:
|
66 |
|
67 |
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
|
|
|
182 |
</tbody>
|
183 |
</table>
|
184 |
|
185 |
+
## Technical Report
|
186 |
|
187 |
+
Coming soon....
|
188 |
|
189 |
+
## Citation
|
190 |
If Falcon3 family were helpful to your work, feel free to give us a cite.
|
191 |
|
192 |
```
|