melaseddik commited on
Commit
b6bed06
·
verified ·
1 Parent(s): 922a098

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -8,19 +8,19 @@ tags:
8
  - falcon3
9
  ---
10
 
11
- # Falcon3-7B-Base
12
 
13
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
14
 
15
- This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
16
- Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
17
 
18
  ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
19
 
20
  ## Model Details
21
  - Architecture
22
  - transformer based causal decoder only architecture
23
- - 28 decoder blocks
24
  - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
25
  - wider head dimension: 256
26
  - high RoPE value to support long context understanding: 1000042
@@ -44,7 +44,7 @@ from transformers import pipeline
44
 
45
  pipe = pipeline(
46
  "text-generation",
47
- model="tiiuae/Falcon3-7B-Base",
48
  torch_dtype=torch.bfloat16,
49
  device_map="auto"
50
  )
 
8
  - falcon3
9
  ---
10
 
11
+ # Falcon3-10B-Base
12
 
13
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
14
 
15
+ This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
16
+ Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
17
 
18
  ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
19
 
20
  ## Model Details
21
  - Architecture
22
  - transformer based causal decoder only architecture
23
+ - 40 decoder blocks
24
  - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
25
  - wider head dimension: 256
26
  - high RoPE value to support long context understanding: 1000042
 
44
 
45
  pipe = pipeline(
46
  "text-generation",
47
+ model="tiiuae/Falcon3-10B-Base",
48
  torch_dtype=torch.bfloat16,
49
  device_map="auto"
50
  )