SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-small-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
0	'Been using this excellent product for years don t ever try and do income taxes without it ' 'Use kaspersky every year best product around Will use no other product best prosit I have seen on the market' 'I ve used Norton before and various free anti virus and with a professional version you get a more comprehensive set of security options that quietly takes care of business in the back ground There is a peace of mind factor that a professional version gives you and for the less than tech savvy it s a bit more idiot proof than a bare bones free ware I have no problem with free ware as my computing needs are pretty simple but a pro version is very nice and this is pretty cheap for the year long comfort of install it and then pretty much forget about it security I got this current product via the Vine but I have bought the professional Norton for the two years running previously when it has been on sale I have multiple computers so the license is handy and I do tend to use all three For the most part Norton is comfortable and user friendly especially if you aren t overly expert with using software '
1	'I have use Quicken for over years and I can t believe how cumbersome and poorly conceived this version is compared to past versions The main page is useless and you now have to open multiple windows to get the information you need then you have to close all the windows you opened to get to the next account When looking at a performance page of your investment accounts you get a pie chart instead of a bar graph What good is a pie chart when you are looking at performance data over a specific time range I thought the purpose of newer versions was to improve the existing version and not regress If Microsoft still had a financial program I would be forced to migrate to another program Intuit needs to change it s company name because this program is not intuitive It is ill conceived and makes for a frustrating experience ' 'Would not install activation code not accepted Returned it ' 'I installed this over Norton which I have used and had no problems with My computer slowed to a crawl NAV ate all my computer s resources Activation is a problem and so is its updating proceedures I uninstalled it after it just plain was not working There are still remnents of it on my machine that will not go away I bought Zone Alarm Security Suite ZA Suite is great uses very little resources and my computer is now speedy again Norton is totally overgrown and needs to be rewritten from the source code I will never use a Norton Product again '

Label

Examples

'Been using this excellent product for years don t ever try and do income taxes without it '
'Use kaspersky every year best product around Will use no other product best prosit I have seen on the market'
'I ve used Norton before and various free anti virus and with a professional version you get a more comprehensive set of security options that quietly takes care of business in the back ground There is a peace of mind factor that a professional version gives you and for the less than tech savvy it s a bit more idiot proof than a bare bones free ware I have no problem with free ware as my computing needs are pretty simple but a pro version is very nice and this is pretty cheap for the year long comfort of install it and then pretty much forget about it security I got this current product via the Vine but I have bought the professional Norton for the two years running previously when it has been on sale I have multiple computers so the license is handy and I do tend to use all three For the most part Norton is comfortable and user friendly especially if you aren t overly expert with using software '

'I have use Quicken for over years and I can t believe how cumbersome and poorly conceived this version is compared to past versions The main page is useless and you now have to open multiple windows to get the information you need then you have to close all the windows you opened to get to the next account When looking at a performance page of your investment accounts you get a pie chart instead of a bar graph What good is a pie chart when you are looking at performance data over a specific time range I thought the purpose of newer versions was to improve the existing version and not regress If Microsoft still had a financial program I would be forced to migrate to another program Intuit needs to change it s company name because this program is not intuitive It is ill conceived and makes for a frustrating experience '
'Would not install activation code not accepted Returned it '
'I installed this over Norton which I have used and had no problems with My computer slowed to a crawl NAV ate all my computer s resources Activation is a problem and so is its updating proceedures I uninstalled it after it just plain was not working There are still remnents of it on my machine that will not go away I bought Zone Alarm Security Suite ZA Suite is great uses very little resources and my computer is now speedy again Norton is totally overgrown and needs to be rewritten from the source code I will never use a Norton Product again '

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("selina09/yt_setfit2")
# Run inference
preds = model("dont trust it")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	93.9133	364

Label	Training Sample Count
0	75
1	75

Training Hyperparameters

batch_size: (32, 32)
num_epochs: (10, 10)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0028	1	0.2613	-
0.1401	50	0.239	-
0.2801	100	0.2175	-
0.4202	150	0.2015	-
0.5602	200	0.0628	-
0.7003	250	0.0534	-
0.8403	300	0.0163	-
0.9804	350	0.0105	-
1.1204	400	0.0259	-
1.2605	450	0.0024	-
1.4006	500	0.0013	-
1.5406	550	0.0196	-
1.6807	600	0.0157	-
1.8207	650	0.0184	-
1.9608	700	0.0159	-
2.1008	750	0.0062	-
2.2409	800	0.0179	-
2.3810	850	0.0165	-
2.5210	900	0.0092	-
2.6611	950	0.0299	-
2.8011	1000	0.0071	-
2.9412	1050	0.0115	-
3.0812	1100	0.0007	-
3.2213	1150	0.0248	-
3.3613	1200	0.0007	-
3.5014	1250	0.0096	-
3.6415	1300	0.0091	-
3.7815	1350	0.0007	-
3.9216	1400	0.0255	-
4.0616	1450	0.0065	-
4.2017	1500	0.0178	-
4.3417	1550	0.0168	-
4.4818	1600	0.0161	-
4.6218	1650	0.0093	-
4.7619	1700	0.0337	-
4.9020	1750	0.0148	-
5.0420	1800	0.0082	-
5.1821	1850	0.023	-
5.3221	1900	0.0185	-
5.4622	1950	0.0155	-
5.6022	2000	0.0176	-
5.7423	2050	0.0004	-
5.8824	2100	0.0221	-
6.0224	2150	0.0004	-
6.1625	2200	0.0045	-
6.3025	2250	0.0004	-
6.4426	2300	0.0081	-
6.5826	2350	0.0089	-
6.7227	2400	0.0091	-
6.8627	2450	0.0004	-
7.0028	2500	0.0238	-
7.1429	2550	0.0056	-
7.2829	2600	0.0175	-
7.4230	2650	0.0088	-
7.5630	2700	0.0383	-
7.7031	2750	0.0356	-
7.8431	2800	0.0004	-
7.9832	2850	0.0231	-
8.1232	2900	0.0292	-
8.2633	2950	0.0384	-
8.4034	3000	0.0004	-
8.5434	3050	0.0091	-
8.6835	3100	0.0079	-
8.8235	3150	0.0298	-
8.9636	3200	0.0083	-
9.1036	3250	0.0004	-
9.2437	3300	0.0003	-
9.3838	3350	0.0312	-
9.5238	3400	0.0157	-
9.6639	3450	0.0003	-
9.8039	3500	0.0306	-
9.9440	3550	0.0084	-

Framework Versions

Python: 3.10.12
SetFit: 1.0.3
Sentence Transformers: 3.0.1
Transformers: 4.40.2
PyTorch: 2.4.0+cu121
Datasets: 2.21.0
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

selina09
/

yt_setfit2