Table of Contents
TL;DR
Model Details
Model Description
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only
- Architecture: Pure-transformer - 1.58bit version
- Language(s) (NLP): Mainly English
- License: TII Falcon License 2.0
Training details
The model has been trained following the training strategies from the recent 1-bit LLM HF blogpost and 1-bit LLM paper. For more details about the training protocol of this model, please refer to the Falcon-3 technical report, section Compression.
Usage
Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. You can also play with the model using the falcon-1.58bit playground (only for the 7B instruct version).
π€ transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon3-3B-Base-1.58bit"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
).to("cuda")
# Perform text generation
BitNet
git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
python setup_env.py --hf-repo tiiuae/Falcon3-3B-Base-1.58bit -q i2_s
python run_inference.py -m models/Falcon3-3B-Base-1.58bit/ggml-model-i2_s.gguf -p "Hi how are you doing today?" -cnv
Evaluation
We report in the following table our internal pipeline benchmarks:
Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores
Benchmark | Llama3-8B-1.58-100B-tokens | Falcon3-7B-Instruct-1.58bit |
---|---|---|
IFEval | 17.91 | 27.49 |
MUSR | 4.87 | 4.64 |
GPQA | 1.83 | 0.00 |
BBH | 5.36 | 2.97 |
MMLU-PRO | 2.78 | 1.47 |
MATH | 0.26 | 0.43 |
Average | 5.5 | 6.17 |
Citation
Coming soon ..
- Downloads last month
- 140
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tiiuae/Falcon3-3B-Base-1.58bit
Base model
tiiuae/Falcon3-3B-Base