bclavie
/

ModernBERT-base-fineweb-edu-example

Text Classification

Inference Endpoints

Model card Files Files and versions Community

ModernBERT-base-fineweb-edu-example / README.md

bclavie's picture

Update README.md

690bdf8 verified 14 days ago

|

history blame contribute delete

2.51 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- ModernBERT
	- fineweb
	- filtering
	- regression
	metrics:
	- precision
	- recall
	- accuracy
	model-index:
	- name: 8e-5_one_label
	results: []
	datasets:
	- HuggingFaceFW/fineweb-edu-llama3-annotations
	language:
	- en
	---

	One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base.

	w/o extensive tuning, the model trains considerably faster than BERT-base, and gets +5 Weighted F1:

	# Results

	## ModernBERT-base-fineweb-edu-example

	Weighted F1: 0.76

	Detailed:

	```
	Validation Report:
	precision recall f1-score support

	0 0.80 0.55 0.65 5694
	1 0.82 0.86 0.84 26512
	2 0.64 0.71 0.67 10322
	3 0.65 0.60 0.63 3407
	4 0.80 0.37 0.51 807
	5 0.00 0.00 0.00 1

	accuracy 0.76 46743
	macro avg 0.62 0.51 0.55 46743
	weighted avg 0.76 0.76 0.76 46743
	```

	## Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier):

	Weighted F1: 0.71

	Detailed:

	```
	precision recall f1-score support

	0 0.75 0.49 0.59 5694
	1 0.78 0.84 0.81 26512
	2 0.57 0.61 0.59 10322
	3 0.56 0.50 0.53 3407
	4 0.58 0.35 0.44 807
	5 0.33 0.01 0.02 125

	accuracy 0.71 46867
	macro avg 0.60 0.47 0.50 46867
	weighted avg 0.71 0.71 0.71 46867
	```

	(for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.)

	# Params

	Most parameters detailed in the script. Key hparams:

	- Learning Rate: 5e-5
	- Weight Decay: 0.1 (decoupled)
	- Seed: 1
	- Warmup: 10% steps
	- Schedule: Linear decay
	- Max epochs: 10
	- Best Epoch: #3
	- Precision: bfloat16