Kasper Piskorski
commited on
Commit
•
fe972ab
1
Parent(s):
e3bdfe3
Update README.md
Browse files
README.md
CHANGED
@@ -14,24 +14,24 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
|
|
14 |
|
15 |
# Falcon3-10B-Base
|
16 |
|
17 |
-
**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
|
18 |
|
19 |
-
This repository contains the **Falcon3-10B-Base**. It achieves state
|
20 |
-
Falcon3-10B-Base supports 4 languages (
|
21 |
|
22 |
-
⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most
|
23 |
|
24 |
## Model Details
|
25 |
- Architecture
|
26 |
-
- Transformer
|
27 |
- 40 decoder blocks
|
28 |
-
- Grouped
|
29 |
- Wider head dimension: 256
|
30 |
- High RoPE value to support long context understanding: 1000042
|
31 |
- Uses SwiGLu and RMSNorm
|
32 |
- 32K context length
|
33 |
- 131K vocab size
|
34 |
-
- Depth
|
35 |
- Supports EN, FR, ES, PT
|
36 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
37 |
- License: TII Falcon-LLM License 2.0
|
@@ -187,7 +187,7 @@ We report in the following table our internal pipeline benchmarks:
|
|
187 |
Coming soon....
|
188 |
|
189 |
## Citation
|
190 |
-
If Falcon3 family were helpful
|
191 |
|
192 |
```
|
193 |
@misc{Falcon3,
|
|
|
14 |
|
15 |
# Falcon3-10B-Base
|
16 |
|
17 |
+
**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
|
18 |
|
19 |
+
This repository contains the **Falcon3-10B-Base**. It achieves state-of-the-art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
|
20 |
+
Falcon3-10B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
|
21 |
|
22 |
+
⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases.**
|
23 |
|
24 |
## Model Details
|
25 |
- Architecture
|
26 |
+
- Transformer-based causal decoder-only architecture
|
27 |
- 40 decoder blocks
|
28 |
+
- Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
|
29 |
- Wider head dimension: 256
|
30 |
- High RoPE value to support long context understanding: 1000042
|
31 |
- Uses SwiGLu and RMSNorm
|
32 |
- 32K context length
|
33 |
- 131K vocab size
|
34 |
+
- Depth up-scaled from **Falcon3-7B-Base** with 2 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
35 |
- Supports EN, FR, ES, PT
|
36 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
37 |
- License: TII Falcon-LLM License 2.0
|
|
|
187 |
Coming soon....
|
188 |
|
189 |
## Citation
|
190 |
+
If Falcon3 family were helpful in your work, feel free to give us a cite.
|
191 |
|
192 |
```
|
193 |
@misc{Falcon3,
|