Edit model card

Scandinavian Education Classifier Snowflake

!!! We recomment using our bert-based model instead for production

Trained using code from: [CosmoPedia)[]https://github.com/huggingface/cosmopedia/tree/main/classification], and the nb-bert-base as starting point. The data used in classification is from GlotCC and have been annotated using Gemini 1.5 Flash.

The following command where used for training:

 python train_edu_bert.py --base_model_name="NbAiLab/nb-bert-base" --dataset_name="north/scandinavian-educational-annotations" --target_column="score" --checkpoint_dir="/home/pere/checkpoints/scandinavian_bert/"

Classification Report

Class Precision Recall F1-Score Support
0 0.76 0.64 0.70 18274
1 0.63 0.76 0.69 23348
2 0.48 0.40 0.43 6621
3 0.57 0.28 0.38 1314
4 0.56 0.06 0.12 433
5 0.00 0.00 0.00 10
Metric Value
Accuracy 0.65
Macro Avg
- Precision 0.50
- Recall 0.36
- F1-Score 0.38
Weighted Avg
- Precision 0.65
- Recall 0.65
- F1-Score 0.64
Total Support 50000

Confusion Matrix

Class 0 Class 1 Class 2 Class 3 Class 4 Class 5
Class 0 11725 6460 88 1 0 0
Class 1 3598 17758 1978 14 0 0
Class 2 128 3733 2618 142 0 0
Class 3 6 272 645 369 22 0
Class 4 2 121 161 121 28 0
Class 5 0 2 8 0 0 0

Evaluation Metrics

Metric Value
Eval Loss 0.3311704695224762
Eval Precision 0.49857140934204414
Eval Recall 0.35718277242555724
Eval F1 Macro 0.38442290605864393
Eval Accuracy 0.64996
Eval Runtime 86.1773
Eval Samples Per Second 580.199
Eval Steps Per Second 4.537
Epoch 19.91

Training Metrics

Metric Value
Loss 0.318
Grad Norm 0.6617229580879211
Learning Rate 5.119453924914675e-07
Epoch 19.97

Training Runtime

Metric Value
Train Runtime 19583.1034
Train Samples Per Second 459.58
Train Steps Per Second 1.795
Train Loss 0.341879387194793
Epoch 20.0
Downloads last month
1
Safetensors
Model size
109M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).