File size: 3,072 Bytes

e4b2f81
eb65128
 
4412933
 
e4b2f81
 
 
eb65128
e4b2f81
eb65128
 
 
 
 
e4b2f81
 
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
 
 
458d65f
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
458d65f
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
458d65f
 
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
eb65128
e4b2f81
 
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
 
 
 
 
 
e4b2f81
 
eb65128
e4b2f81
 
 
eb65128
e4b2f81

---
language:
- en
tags:
- falcon3
---


#  Table of Contents

0. [TL;DR](#TL;DR)
1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Training Details](#training-details)
4. [Evaluation](#evaluation)


# TL;DR

# Model Details

## Model Description

- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
- **Model type:** Causal decoder-only
- **Architecture:** Transformer-base
- **Language(s) (NLP):** Mainly English
- **License:** TII Falcon-LLM License 2.0

<br>

# Usage

Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):

## Using the Pytorch model with 🤗 transformers

### Running the model on a CPU

<details>
<summary> Click to expand </summary>

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU using `torch.compile`

<details>
<summary> Click to expand </summary>

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>


# Training Details

## Training Data

## Training Procedure

### Training Hyperparameters

| **Hyperparameter** | **Value**  | **Comment**                               |
|--------------------|------------|-------------------------------------------|
| Precision          | `bfloat16` |                                           |
| Optimizer          | AdamW      |                                           |
| Max learning rate  |      | Following a WSD (warmup-stable-decay) learning rate schedule |
| Weight decay       |        |                                           |
| Batch size         |        |                                           |


# Evaluation



# Citation