tiiuae
/

Falcon3-10B-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kasper Piskorski commited on Dec 16, 2024

Commit

fe972ab

·

verified ·

1 Parent(s): e3bdfe3

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -14,24 +14,24 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
 # Falcon3-10B-Base
-**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
-This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
-Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
-⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most usecases.**
 ## Model Details
 - Architecture
-  - Transformer based causal decoder only architecture
   - 40 decoder blocks
-  - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
   - Wider head dimension: 256
   - High RoPE value to support long context understanding: 1000042
   - Uses SwiGLu and RMSNorm
   - 32K context length
   - 131K vocab size
-- Depth-up-scaled from **Falcon3-7B-Base** with 2 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
 - Supports EN, FR, ES, PT
 - Developed by [Technology Innovation Institute](https://www.tii.ae)
 - License: TII Falcon-LLM License 2.0
@@ -187,7 +187,7 @@ We report in the following table our internal pipeline benchmarks:
 Coming soon....
 ## Citation
-If Falcon3 family were helpful to your work, feel free to give us a cite.
 ```
 @misc{Falcon3,

 # Falcon3-10B-Base
+**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
+This repository contains the **Falcon3-10B-Base**. It achieves state-of-the-art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
+Falcon3-10B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
+⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases.**
 ## Model Details
 - Architecture
+  - Transformer-based causal decoder-only architecture
   - 40 decoder blocks
+  - Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
   - Wider head dimension: 256
   - High RoPE value to support long context understanding: 1000042
   - Uses SwiGLu and RMSNorm
   - 32K context length
   - 131K vocab size
+- Depth up-scaled from **Falcon3-7B-Base** with 2 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
 - Supports EN, FR, ES, PT
 - Developed by [Technology Innovation Institute](https://www.tii.ae)
 - License: TII Falcon-LLM License 2.0
 Coming soon....
 ## Citation
+If Falcon3 family were helpful in your work, feel free to give us a cite.
 ```
 @misc{Falcon3,