Falcon3-7B-Instruct / README.md
puneeshkhanna's picture
Update README.md
2abf5a8 verified
|
raw
history blame
5.49 kB
metadata
language:
  - en
tags:
  - falcon3

Table of Contents

  1. TL;DR
  2. Model Details
  3. Usage
  4. Training Details
  5. Evaluation

TL;DR

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.

Achieves state of art results on reasoning, language understanding, instruction following, code and mathematics tasks.

Supports context length up to 32K.

This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.

Model Details

Model Description

  • Developed by: https://www.tii.ae
  • Model type: Causal decoder-only
  • Architecture: Transformer-base
  • Language(s) (NLP): Mainly English
  • License: TII Falcon-LLM License 2.0

Model Architecture

Falcon 3 uses grouped query attention (GQA) for faster inference and a wider head dimension of 256. High ROPE value is used to support long context understanding.

Usage

Find below an example on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Click to expand
from transformers import AutoTokenizer, AutoModelForCausalLM


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tiiuae/Falcon3-7B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many hours in one day?"
messages = [
    {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Benchmarks

We report in the following table our internal pipeline benchmarks:

Category Benchmark Llama-3.1-8B-Instruct Qwen2-7B-Instruct Qwen2.5-7B-Instruct Falcon3-7B-Instruct
General MMLU (5-shot) - - - -
MMLU-PRO (5-shot) - - - -
IFEval - - - -
Math GSM8K (5-shot) - - - -
MATH(4-shot) - - - -
Reasoning Arc Challenge (25-shot) - - - -
GPQA (0-shot) - - - -
MUSR (0-shot) - - - -
BBH (3-shot) - - - -
CommonSense Understanding PIQA (0-shot) - - - -
SciQ (0-shot) - - - -
Winogrande (0-shot) - - - -
OpenbookQA (0-shot) - - - -

Citation

If Falcon3 series were helpful to your work, feel free to give us a cite.

@misc{Falcon3,
    title = {Falcon 3 family of Open Foundation Models},
    author = {TII Team},
    month = {December},
    year = {2024}
}