Falcon3-10B-Base / README.md
melaseddik's picture
Update README.md
0b20cce verified
|
raw
history blame
7.71 kB
metadata
language:
  - en
tags:
  - falcon3

Table of Contents

  1. TL;DR
  2. Model Details
  3. Usage
  4. Training Details
  5. Evaluation

TL;DR

Model Details

Model Description

  • Developed by: https://www.tii.ae
  • Model type: Causal decoder-only
  • Architecture: Transformer-base
  • Language(s) (NLP): Mainly English
  • License: TII Falcon-LLM License 2.0

Usage

Find below some example scripts on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Click to expand
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using torch.compile

Click to expand
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hyperparameter Value Comment
Precision bfloat16
Optimizer AdamW
Max learning rate Following a WSD (warmup-stable-decay) learning rate schedule
Weight decay
Batch size

Evaluation

Category Benchmark Llama3.1-8B Qwen2-7B Qwen2.5-7B Falcon3-7B-Base Gemma2-9B Yi1.5-9B Mistral-NeMo-12B Falcon3-10B-Base
General MMLU (5-shot) 65.2 70.4 74.2 67.5 0 69.6 68.8 73.1
MMLU-PRO (5-shot) 32.7 42.1 43.5 39.2 0 39.3 34.7 42.5
IFEval 12.0 30.6 33.9 34.3 0 29.1 16.1 36.4
Math GSM8K (5-shot) 49.4 77.9 82.9 76.2 69.1 63.8 55.3 81.4
MATH(4-shot) 4.1 17.5 15.5 18.0 0 9.2 4.9 22.9
Reasoning Arc Challenge (25-shot) 53.4 57.4 59.0 59.6 63.7 58.2 60.6 62.6
GPQA (0-shot) 31.0 31.9 33.0 35.5 0 36.6 28.8 34.1
MUSR (0-shot) 38.0 44.1 44.2 47.3 0 43.3 39.2 44.2
BBH (3-shot) 46.5 53.3 54.0 51.0 0 51.3 50.2 59.7
CommonSense Understanding PIQA (0-shot) 80.3 79.8 78.7 77.7 81.4 79.8 81.4 79.1
SciQ (0-shot) 96.3 95.9 96.6 95.3 97.2 95.8 96.4 96.0
Winogrande (0-shot) 74.0 72.1 72.9 71.0 74.2 72.7 73.2 73.6
OpenbookQA (0-shot) 33.4 35.2 33.6 31.4 34.0 35.4 36.4 34.0

Citation