bclavie's picture
Update README.md
690bdf8 verified
---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- ModernBERT
- fineweb
- filtering
- regression
metrics:
- precision
- recall
- accuracy
model-index:
- name: 8e-5_one_label
results: []
datasets:
- HuggingFaceFW/fineweb-edu-llama3-annotations
language:
- en
---
One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base.
w/o extensive tuning, the model trains considerably faster than BERT-base, and gets **+5 Weighted F1**:
# Results
## ModernBERT-base-fineweb-edu-example
**Weighted F1: 0.76**
**Detailed:**
```
Validation Report:
precision recall f1-score support
0 0.80 0.55 0.65 5694
1 0.82 0.86 0.84 26512
2 0.64 0.71 0.67 10322
3 0.65 0.60 0.63 3407
4 0.80 0.37 0.51 807
5 0.00 0.00 0.00 1
accuracy 0.76 46743
macro avg 0.62 0.51 0.55 46743
weighted avg 0.76 0.76 0.76 46743
```
## Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier):
**Weighted F1: 0.71**
**Detailed:**
```
precision recall f1-score support
0 0.75 0.49 0.59 5694
1 0.78 0.84 0.81 26512
2 0.57 0.61 0.59 10322
3 0.56 0.50 0.53 3407
4 0.58 0.35 0.44 807
5 0.33 0.01 0.02 125
accuracy 0.71 46867
macro avg 0.60 0.47 0.50 46867
weighted avg 0.71 0.71 0.71 46867
```
(for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.)
# Params
Most parameters detailed in the script. Key hparams:
- **Learning Rate**: 5e-5
- **Weight Decay**: 0.1 (decoupled)
- **Seed**: 1
- **Warmup**: 10% steps
- **Schedule**: Linear decay
- **Max epochs**: 10
- **Best Epoch**: #3
- **Precision**: bfloat16