metadata
base_model: Alibaba-NLP/gte-base-en-v1.5
library_name: setfit
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
widget:
- text: >-
Tech Start-up Revolutionizes Water Purification SAN FRANCISCO - AquaTech,
a Silicon Valley start-up, unveiled its groundbreaking water purification
system today. Using advanced nanotechnology, the device can purify
contaminated water in seconds, potentially bringing safe drinking water to
millions. "This could be a game-changer for global health," said WHO
representative Dr. Amina Osei. Field trials are set to begin next month.
- text: >-
Whistleblower Exposes Massive Fraud in Medicare Billing WASHINGTON - A
former employee of MedTech Solutions, a major medical equipment supplier,
has come forward with explosive allegations of systematic fraud in
Medicare billing practices. The whistleblower, whose identity remains
protected, claims the company routinely inflated prices and billed for
unnecessary equipment, defrauding the government of an estimated $1.2
billion over five years. Documents obtained by this newspaper appear to
corroborate these claims, showing discrepancies between actual costs and
billed amounts for common medical devices such as wheelchairs and oxygen
tanks. "This isn't just about money," said Senator Lisa Kline, chair of
the Senate Health Committee. "This kind of fraud directly impacts patient
care and drives up healthcare costs for everyone." The Department of
Justice has announced a full investigation into MedTech Solutions and its
parent company, HealthCorp International. Industry experts suggest this
could be just the tip of the iceberg, with similar practices potentially
widespread across the medical supply sector. MedTech Solutions has denied
all allegations and vowed to cooperate fully with investigators.
- text: >-
Nursing Home Chain Under Fire for Neglect and Fraud CHICAGO - A damning
report released today by state health inspectors reveals a pattern of
severe neglect and fraudulent practices across Sunset Years, one of the
nation's largest nursing home chains. Investigators found widespread
understaffing, with some facilities staffed at dangerously low levels
while still billing Medicare and Medicaid for full care. In several
instances, residents were found to be malnourished or suffering from
untreated bedsores, despite records indicating proper care. "It's
heartbreaking," said Maria Rodriguez, whose mother was a resident at one
of the chain's Chicago facilities. "We trusted them with our loved ones,
and they betrayed that trust for profit." Sunset Years CEO Robert Thompson
issued a statement claiming the issues were isolated incidents and not
reflective of the company's overall standards. However, multiple state
attorneys general have announced plans to pursue legal action against the
chain
- text: >-
Global Coffee Prices Surge Amid Brazilian Drought Coffee futures hit a
five-year high today as severe drought continues to ravage Brazil's
coffee-growing regions. Experts warn consumers may see significant price
increases in coming months.
- text: >-
BREAKING: Hospital CEO Arrested in Kickback Scheme Federal agents arrested
Mercy General Hospital CEO John Smith today on charges of accepting
kickbacks for preferential treatment of patients. Prosecutors allege Smith
pocketed over $2 million, compromising patient care. Smith's lawyer denies
all accusations.
inference: true
model-index:
- name: SetFit with Alibaba-NLP/gte-base-en-v1.5
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Unknown
type: unknown
split: test
metrics:
- type: accuracy
value: 0.8181818181818182
name: Accuracy
SetFit with Alibaba-NLP/gte-base-en-v1.5
This is a SetFit model that can be used for Text Classification. This SetFit model uses Alibaba-NLP/gte-base-en-v1.5 as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
- Model Type: SetFit
- Sentence Transformer body: Alibaba-NLP/gte-base-en-v1.5
- Classification head: a SetFitHead instance
- Maximum Sequence Length: 8192 tokens
- Number of Classes: 2 classes
Model Sources
- Repository: SetFit on GitHub
- Paper: Efficient Few-Shot Learning Without Prompts
- Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts
Model Labels
Label | Examples |
---|---|
1 |
|
0 |
|
Evaluation
Metrics
Label | Accuracy |
---|---|
all | 0.8182 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("twright8/news_cats")
# Run inference
preds = model("Global Coffee Prices Surge Amid Brazilian Drought Coffee futures hit a five-year high today as severe drought continues to ravage Brazil's coffee-growing regions. Experts warn consumers may see significant price increases in coming months.")
Training Details
Training Set Metrics
Training set | Min | Median | Max |
---|---|---|---|
Word count | 55 | 153.8462 | 290 |
Label | Training Sample Count |
---|---|
0 | 13 |
1 | 13 |
Training Hyperparameters
- batch_size: (8, 1)
- num_epochs: (3, 17)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (9.629116538858926e-05, 2.651259436793277e-05)
- head_learning_rate: 0.02145586669240117
- loss: CoSENTLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: True
- use_amp: True
- warmup_proportion: 0.1
- max_length: 512
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True
Training Results
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.0217 | 1 | 1.8133 | - |
0.4348 | 20 | 0.0054 | 1.6363 |
0.8696 | 40 | 0.0 | 4.9011 |
1.3043 | 60 | 0.0 | 7.0885 |
1.7391 | 80 | 0.0 | 6.2756 |
2.1739 | 100 | 0.0 | 6.2417 |
2.6087 | 120 | 0.0 | 6.4769 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.13
- SetFit: 1.0.3
- Sentence Transformers: 3.0.1
- Transformers: 4.39.0
- PyTorch: 2.3.0+cu121
- Datasets: 2.20.0
- Tokenizers: 0.15.2
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}