File size: 3,072 Bytes
e4b2f81
eb65128
 
4412933
 
e4b2f81
 
 
eb65128
e4b2f81
eb65128
 
 
 
 
e4b2f81
 
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
 
 
458d65f
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
458d65f
 
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
458d65f
 
e4b2f81
eb65128
e4b2f81
eb65128
 
e4b2f81
eb65128
 
 
e4b2f81
eb65128
e4b2f81
 
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
e4b2f81
eb65128
 
 
 
 
 
 
e4b2f81
 
eb65128
e4b2f81
 
 
eb65128
e4b2f81
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
language:
- en
tags:
- falcon3
---


#  Table of Contents

0. [TL;DR](#TL;DR)
1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Training Details](#training-details)
4. [Evaluation](#evaluation)


# TL;DR

# Model Details

## Model Description

- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
- **Model type:** Causal decoder-only
- **Architecture:** Transformer-base
- **Language(s) (NLP):** Mainly English
- **License:** TII Falcon-LLM License 2.0

<br>

# Usage

Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):

## Using the Pytorch model with 🤗 transformers

### Running the model on a CPU

<details>
<summary> Click to expand </summary>

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU using `torch.compile`

<details>
<summary> Click to expand </summary>

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>


# Training Details

## Training Data

## Training Procedure

### Training Hyperparameters

| **Hyperparameter** | **Value**  | **Comment**                               |
|--------------------|------------|-------------------------------------------|
| Precision          | `bfloat16` |                                           |
| Optimizer          | AdamW      |                                           |
| Max learning rate  |      | Following a WSD (warmup-stable-decay) learning rate schedule |
| Weight decay       |        |                                           |
| Batch size         |        |                                           |


# Evaluation



# Citation