tiiuae
/

Falcon3-10B-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

puneeshkhanna commited on Dec 16, 2024

Commit

44788f8

·

verified ·

1 Parent(s): a53377e

Update README.md

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -19,17 +19,18 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
 This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
 Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
-⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 ## Model Details
 - Architecture
-  - transformer based causal decoder only architecture
   - 40 decoder blocks
-  - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
-  - wider head dimension: 256
-  - high RoPE value to support long context understanding: 1000042
-  - 32k context length
-  - 131k vocab size
 - Depth-up-scaled from **Falcon3-7B-Base** with 2 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
 - Supports EN, FR, ES, PT
 - Developed by [Technology Innovation Institute](https://www.tii.ae)
@@ -60,7 +61,7 @@ print(response[0]['generated_text'])
 <br>
-# Benchmarks
 We report in the following table our internal pipeline benchmarks:
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
@@ -181,9 +182,11 @@ We report in the following table our internal pipeline benchmarks:
     </tbody>
 </table>
-# Citation
 If Falcon3 family were helpful to your work, feel free to give us a cite.
 ```

 This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
 Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
+⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most usecases.**
 ## Model Details
 - Architecture
+  - Transformer based causal decoder only architecture
   - 40 decoder blocks
+  - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
+  - Wider head dimension: 256
+  - High RoPE value to support long context understanding: 1000042
+  - Use SwiGLu and RMSNorm
+  - 32K context length
+  - 131K vocab size
 - Depth-up-scaled from **Falcon3-7B-Base** with 2 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
 - Supports EN, FR, ES, PT
 - Developed by [Technology Innovation Institute](https://www.tii.ae)
 <br>
+## Benchmarks
 We report in the following table our internal pipeline benchmarks:
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     </tbody>
 </table>
+## Technical Report
+Coming soon....
+## Citation
 If Falcon3 family were helpful to your work, feel free to give us a cite.
 ```