File size: 3,617 Bytes
4151e21 9e880d3 364b651 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 4151e21 9e880d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: mit
datasets:
- tiiuae/falcon-refinedweb
language:
- en
library_name: transformers
---
# NeoBERT
[](https://huggingface.co/chandar-lab/NeoBERT)
NeoBERT is a **next-generation encoder** model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an **optimal depth-to-width ratio**, and leverages an extended context length of **4,096 tokens**. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves **state-of-the-art results** on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions.
- Paper: [paper](https://arxiv.org/abs/2502.19587)
- Repository: [github](https://github.com/chandar-lab/NeoBERT).
## Get started
Ensure you have the following dependencies installed:
```bash
pip install transformers torch xformers==0.0.28.post3
```
If you would like to use sequence packing (un-padding), you will need to also install flash-attention:
```bash
pip install transformers torch xformers==0.0.28.post3 flash_attn
```
## How to use
Load the model using Hugging Face Transformers:
```python
from transformers import AutoModel, AutoTokenizer
model_name = "chandar-lab/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
# Tokenize input text
text = "NeoBERT is the most efficient model of its kind!"
inputs = tokenizer(text, return_tensors="pt")
# Generate embeddings
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
```
## Features
| **Feature** | **NeoBERT** |
|---------------------------|-----------------------------|
| `Depth-to-width` | 28 × 768 |
| `Parameter count` | 250M |
| `Activation` | SwiGLU |
| `Positional embeddings` | RoPE |
| `Normalization` | Pre-RMSNorm |
| `Data Source` | RefinedWeb |
| `Data Size` | 2.8 TB |
| `Tokenizer` | google/bert |
| `Context length` | 4,096 |
| `MLM Masking Rate` | 20% |
| `Optimizer` | AdamW |
| `Scheduler` | CosineDecay |
| `Training Tokens` | 2.1 T |
| `Efficiency` | FlashAttention |
## License
Model weights and code repository are licensed under the permissive MIT license.
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{breton2025neobertnextgenerationbert,
title={NeoBERT: A Next-Generation BERT},
author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
year={2025},
eprint={2502.19587},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.19587},
}
```
## Contact
For questions, do not hesitate to reach out and open an issue on here or on our **[GitHub](https://github.com/chandar-lab/NeoBERT)**.
---
|