Kasper Piskorski commited on
Commit
35cd6af
·
verified ·
1 Parent(s): 7aae4f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -14,23 +14,23 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
14
 
15
  # Falcon3-7B-Instruct
16
 
17
- **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
18
 
19
- This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
- Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
21
 
22
  ## Model Details
23
  - Architecture
24
- - Transformer based causal decoder only architecture
25
  - 28 decoder blocks
26
- - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
27
  - Wider head dimension: 256
28
  - High RoPE value to support long context understanding: 1000042
29
  - Uses SwiGLU and RMSNorm
30
  - 32K context length
31
  - 131K vocab size
32
  - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
33
- - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
34
  - Supports EN, FR, ES, PT
35
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
36
  - License: TII Falcon-LLM License 2.0
 
14
 
15
  # Falcon3-7B-Instruct
16
 
17
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
18
 
19
+ This repository contains the **Falcon3-7B-Instruct**. It achieves state-of-the-art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
+ Falcon3-7B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
21
 
22
  ## Model Details
23
  - Architecture
24
+ - Transformer-based causal decoder-only architecture
25
  - 28 decoder blocks
26
+ - Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
27
  - Wider head dimension: 256
28
  - High RoPE value to support long context understanding: 1000042
29
  - Uses SwiGLU and RMSNorm
30
  - 32K context length
31
  - 131K vocab size
32
  - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
33
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
34
  - Supports EN, FR, ES, PT
35
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
36
  - License: TII Falcon-LLM License 2.0