Model2Vec
Safetensors
English
embeddings
static-embeddings

potion-8m-edu-classifier Model Card

This Model2Vec model is a fine-tuned version of potion-base-8m. It was trained to predict educational content, analogous to how the fineweb-edu-classifier was used to filter educational content.

It achieves the following performance on the evaluation split:

              precision    recall  f1-score   support

           0       0.70      0.42      0.52      5694
           1       0.75      0.86      0.80     26512
           2       0.55      0.51      0.53     10322
           3       0.54      0.45      0.49      3407
           4       0.59      0.30      0.40       807
           5       0.00      0.00      0.00         1

    accuracy                           0.69     46743
   macro avg       0.52      0.42      0.46     46743
weighted avg       0.68      0.69      0.68     46743

When thresholded to a binary classifier, it achieves a macro-averaged F1-score of 0.79. The original classifier achieves 0.81 on the same dataset, but this classifier is orders of magnitude faster on CPU.

              precision    recall  f1-score   support

     not edu       0.96      0.98      0.97     42528
         edu       0.70      0.54      0.61      4215

    accuracy                           0.94     46743
   macro avg       0.83      0.76      0.79     46743
weighted avg       0.93      0.94      0.93     46743

Installation

Install model2vec with the inference extra using pip:

pip install model2vec[inference]

Usage

Load this model using the from_pretrained method:

from model2vec.inference import StaticModelPipeline

# Load a pretrained Model2Vec model
model = StaticModelPipeline.from_pretrained("minishlab/potion-8m-edu-classifier")

# Predict labels
label = model.predict(["Example sentence"])

Library Authors

Model2Vec was developed by Minish.

Citation

Please cite the Model2Vec repository if you use this model in your work.

@software{minishlab2024model2vec,
  authors = {Stephan Tulkens, Thomas van Dongen},
  title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec},
}
Downloads last month
15
Safetensors
Model size
7.56M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for minishlab/potion-8m-edu-classifier

Finetuned
(1)
this model

Dataset used to train minishlab/potion-8m-edu-classifier