puneeshkhanna
commited on
Commit
•
e3bdfe3
1
Parent(s):
c0da908
Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and
|
|
28 |
- Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
|
29 |
- Wider head dimension: 256
|
30 |
- High RoPE value to support long context understanding: 1000042
|
31 |
-
-
|
32 |
- 32K context length
|
33 |
- 131K vocab size
|
34 |
- Depth-up-scaled from **Falcon3-7B-Base** with 2 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
|
|
28 |
- Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
|
29 |
- Wider head dimension: 256
|
30 |
- High RoPE value to support long context understanding: 1000042
|
31 |
+
- Uses SwiGLu and RMSNorm
|
32 |
- 32K context length
|
33 |
- 131K vocab size
|
34 |
- Depth-up-scaled from **Falcon3-7B-Base** with 2 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|