metadata
base_model: sentence-transformers/paraphrase-mpnet-base-v2
library_name: setfit
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
widget:
- text: >-
At least 27 people were killed and over 200 injured in a devastating gas
explosion that ripped through a residential area in central Mexico City,
officials said on Tuesday. The blast, which occurred at around 8pm local
time, also left hundreds of people homeless and caused widespread
destruction. The explosion was so powerful that it shattered windows and
damaged buildings several blocks away. Rescue teams were working through
the night to search for anyone who may still be trapped under the rubble.
The cause of the explosion is still unknown, but authorities have launched
an investigation into the incident.
- text: >-
Just got back from the most disappointing concert of my life. The artist
was late, the sound quality was terrible, and they only played 2 songs
from their new album. I was expecting so much more. 1/10 would not
recommend.
- text: >-
The new smartphone from Samsung has exceeded our expectations in every
way. The camera is top-notch, the battery life is impressive, and the
display is vibrant and clear. We were blown away by the seamless
performance and the sleek design. Overall, this phone is a game-changer in
the tech industry and a must-have for anyone looking for a high-quality
device.
- text: >-
Are you kidding me?! I just got a parking ticket for a spot that was
clearly marked as free for 1 hour. The city is just trying to rip us off.
Unbelievable. #Frustrated #ParkingTicket
- text: >-
Renowned actress Emma Stone took home the coveted Golden Globe award for
Best Actress in a Motion Picture last night, marking her second
consecutive win in the category. The 33-year-old actress was visibly
emotional as she accepted the award, thanking her team and family for
their unwavering support. Stone's performance in the critically acclaimed
film 'The Favourite' earned her widespread critical acclaim and a spot in
the running for the prestigious award. This win solidifies her position as
one of the most talented and sought-after actresses in Hollywood.
inference: true
model-index:
- name: SetFit with sentence-transformers/paraphrase-mpnet-base-v2
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Unknown
type: unknown
split: test
metrics:
- type: accuracy
value: 0.89
name: Accuracy
SetFit with sentence-transformers/paraphrase-mpnet-base-v2
This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
- Model Type: SetFit
- Sentence Transformer body: sentence-transformers/paraphrase-mpnet-base-v2
- Classification head: a LogisticRegression instance
- Maximum Sequence Length: 512 tokens
- Number of Classes: 2 classes
Model Sources
- Repository: SetFit on GitHub
- Paper: Efficient Few-Shot Learning Without Prompts
- Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts
Model Labels
Label | Examples |
---|---|
1 |
|
0 |
|
Evaluation
Metrics
Label | Accuracy |
---|---|
all | 0.89 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("Are you kidding me?! I just got a parking ticket for a spot that was clearly marked as free for 1 hour. The city is just trying to rip us off. Unbelievable. #Frustrated #ParkingTicket")
Training Details
Training Set Metrics
Training set | Min | Median | Max |
---|---|---|---|
Word count | 32 | 65.6129 | 112 |
Label | Training Sample Count |
---|---|
1 | 13 |
0 | 18 |
Training Hyperparameters
- batch_size: (16, 16)
- num_epochs: (5, 5)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True
Training Results
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.0303 | 1 | 0.3052 | - |
1.0 | 33 | - | 0.0154 |
1.5152 | 50 | 0.0008 | - |
2.0 | 66 | - | 0.0039 |
3.0 | 99 | - | 0.0019 |
3.0303 | 100 | 0.0001 | - |
4.0 | 132 | - | 0.0017 |
4.5455 | 150 | 0.0002 | - |
5.0 | 165 | - | 0.0014 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.9.19
- SetFit: 1.1.0.dev0
- Sentence Transformers: 3.0.1
- Transformers: 4.39.0
- PyTorch: 2.4.0
- Datasets: 2.20.0
- Tokenizers: 0.15.2
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}