puneeshkhanna commited on
Commit
44788f8
·
verified ·
1 Parent(s): a53377e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -19,17 +19,18 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
19
  This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
  Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
21
 
22
- ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
23
 
24
  ## Model Details
25
  - Architecture
26
- - transformer based causal decoder only architecture
27
  - 40 decoder blocks
28
- - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
29
- - wider head dimension: 256
30
- - high RoPE value to support long context understanding: 1000042
31
- - 32k context length
32
- - 131k vocab size
 
33
  - Depth-up-scaled from **Falcon3-7B-Base** with 2 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
34
  - Supports EN, FR, ES, PT
35
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
@@ -60,7 +61,7 @@ print(response[0]['generated_text'])
60
 
61
  <br>
62
 
63
- # Benchmarks
64
  We report in the following table our internal pipeline benchmarks:
65
 
66
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
@@ -181,9 +182,11 @@ We report in the following table our internal pipeline benchmarks:
181
  </tbody>
182
  </table>
183
 
 
184
 
 
185
 
186
- # Citation
187
  If Falcon3 family were helpful to your work, feel free to give us a cite.
188
 
189
  ```
 
19
  This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
  Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
21
 
22
+ ⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most usecases.**
23
 
24
  ## Model Details
25
  - Architecture
26
+ - Transformer based causal decoder only architecture
27
  - 40 decoder blocks
28
+ - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
29
+ - Wider head dimension: 256
30
+ - High RoPE value to support long context understanding: 1000042
31
+ - Use SwiGLu and RMSNorm
32
+ - 32K context length
33
+ - 131K vocab size
34
  - Depth-up-scaled from **Falcon3-7B-Base** with 2 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
35
  - Supports EN, FR, ES, PT
36
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
 
61
 
62
  <br>
63
 
64
+ ## Benchmarks
65
  We report in the following table our internal pipeline benchmarks:
66
 
67
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
 
182
  </tbody>
183
  </table>
184
 
185
+ ## Technical Report
186
 
187
+ Coming soon....
188
 
189
+ ## Citation
190
  If Falcon3 family were helpful to your work, feel free to give us a cite.
191
 
192
  ```