|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: answerdotai/ModernBERT-base |
|
tags: |
|
- ModernBERT |
|
- fineweb |
|
- filtering |
|
- regression |
|
metrics: |
|
- precision |
|
- recall |
|
- accuracy |
|
model-index: |
|
- name: 8e-5_one_label |
|
results: [] |
|
datasets: |
|
- HuggingFaceFW/fineweb-edu-llama3-annotations |
|
language: |
|
- en |
|
--- |
|
|
|
One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base. |
|
|
|
w/o extensive tuning, the model trains considerably faster than BERT-base, and gets **+5 Weighted F1**: |
|
|
|
# Results |
|
|
|
## ModernBERT-base-fineweb-edu-example |
|
|
|
**Weighted F1: 0.76** |
|
|
|
**Detailed:** |
|
|
|
``` |
|
Validation Report: |
|
precision recall f1-score support |
|
|
|
0 0.80 0.55 0.65 5694 |
|
1 0.82 0.86 0.84 26512 |
|
2 0.64 0.71 0.67 10322 |
|
3 0.65 0.60 0.63 3407 |
|
4 0.80 0.37 0.51 807 |
|
5 0.00 0.00 0.00 1 |
|
|
|
accuracy 0.76 46743 |
|
macro avg 0.62 0.51 0.55 46743 |
|
weighted avg 0.76 0.76 0.76 46743 |
|
``` |
|
|
|
## Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier): |
|
|
|
**Weighted F1: 0.71** |
|
|
|
**Detailed:** |
|
|
|
``` |
|
precision recall f1-score support |
|
|
|
0 0.75 0.49 0.59 5694 |
|
1 0.78 0.84 0.81 26512 |
|
2 0.57 0.61 0.59 10322 |
|
3 0.56 0.50 0.53 3407 |
|
4 0.58 0.35 0.44 807 |
|
5 0.33 0.01 0.02 125 |
|
|
|
accuracy 0.71 46867 |
|
macro avg 0.60 0.47 0.50 46867 |
|
weighted avg 0.71 0.71 0.71 46867 |
|
``` |
|
|
|
(for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.) |
|
|
|
# Params |
|
|
|
Most parameters detailed in the script. Key hparams: |
|
|
|
- **Learning Rate**: 5e-5 |
|
- **Weight Decay**: 0.1 (decoupled) |
|
- **Seed**: 1 |
|
- **Warmup**: 10% steps |
|
- **Schedule**: Linear decay |
|
- **Max epochs**: 10 |
|
- **Best Epoch**: #3 |
|
- **Precision**: bfloat16 |