--- base_model: sentence-transformers/all-MiniLM-L6-v2 library_name: setfit metrics: - accuracy pipeline_tag: text-classification tags: - setfit - sentence-transformers - text-classification - generated_from_setfit_trainer widget: - text: What are the key situations that require the preparation of a mission order? - text: How can audio data be used to improve speaker identification using neural networks? - text: How can organizations balance the need for data privacy with the benefits of involving interns in data-related projects? - text: What is the purpose of the message posted by the CR? - text: What are the consequences of adopting a 'if not broken, don't fix' attitude towards data monitoring? inference: true model-index: - name: SetFit with sentence-transformers/all-MiniLM-L6-v2 results: - task: type: text-classification name: Text Classification dataset: name: Unknown type: unknown split: test metrics: - type: accuracy value: 0.3076923076923077 name: Accuracy --- # SetFit with sentence-transformers/all-MiniLM-L6-v2 This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification. The model has been trained using an efficient few-shot learning technique that involves: 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. 2. Training a classification head with features from the fine-tuned Sentence Transformer. ## Model Details ### Model Description - **Model Type:** SetFit - **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance - **Maximum Sequence Length:** 256 tokens - **Number of Classes:** 4 classes ### Model Sources - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit) - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055) - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit) ### Model Labels | Label | Examples | |:--------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | very_semantic |

'What are the key considerations when proposing names for a project or initiative?'
'What are the key aspects of team life and events in a company?'
'What is being asked for or sought in this conversation?'

| | lexical |

'Who is responsible for reviewing and signing documents related to conference submissions?'
'How do data architecture and management systems enable digital transformation and address its associated challenges?'
'How do keys or access credentials get shared or transferred among team members in a workplace?'

| | very_lexical |

'What are some of the key challenges associated with handling and storing large amounts of genomic data?'
"What is the focus of Eurobiomed's partnership with Digital113?"
'What are the key considerations for generating well-formatted JSON instances that conform to a given schema?'

| | semantic |

'How can visualizations be used to enhance documentation and collaboration in software development?'
'What are the key considerations when choosing a distance metric for a vector database?'
'How can AI be leveraged to support HR departments in detecting and addressing gender bias?'

| ## Evaluation ### Metrics | Label | Accuracy | |:--------|:---------| | **all** | 0.3077 | ## Uses ### Direct Use for Inference First install the SetFit library: ```bash pip install setfit ``` Then you can load this model and run inference. ```python from setfit import SetFitModel # Download from the 🤗 Hub model = SetFitModel.from_pretrained("yaniseuranova/setfit-rag-hybrid-search-query-router-test") # Run inference preds = model("What is the purpose of the message posted by the CR?") ``` ## Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:-------------|:----|:--------|:----| | Word count | 7 | 14.1913 | 24 | | Label | Training Sample Count | |:--------------|:----------------------| | lexical | 41 | | semantic | 24 | | very_lexical | 17 | | very_semantic | 33 | ### Training Hyperparameters - batch_size: (8, 8) - num_epochs: (3, 3) - max_steps: -1 - sampling_strategy: oversampling - body_learning_rate: (2e-05, 1e-05) - head_learning_rate: 0.01 - loss: CosineSimilarityLoss - distance_metric: cosine_distance - margin: 0.25 - end_to_end: False - use_amp: False - warmup_proportion: 0.1 - seed: 42 - eval_max_steps: -1 - load_best_model_at_end: True ### Training Results | Epoch | Step | Training Loss | Validation Loss | |:-------:|:--------:|:-------------:|:---------------:| | 0.0008 | 1 | 0.4237 | - | | 0.0417 | 50 | 0.2917 | - | | 0.0834 | 100 | 0.1835 | - | | 0.1251 | 150 | 0.3215 | - | | 0.1668 | 200 | 0.2299 | - | | 0.2085 | 250 | 0.2595 | - | | 0.2502 | 300 | 0.3193 | - | | 0.2919 | 350 | 0.2288 | - | | 0.3336 | 400 | 0.2947 | - | | 0.3753 | 450 | 0.1171 | - | | 0.4170 | 500 | 0.1442 | - | | 0.4587 | 550 | 0.1859 | - | | 0.5004 | 600 | 0.1959 | - | | 0.5421 | 650 | 0.2797 | - | | 0.5838 | 700 | 0.2079 | - | | 0.6255 | 750 | 0.2706 | - | | 0.6672 | 800 | 0.1956 | - | | 0.7089 | 850 | 0.0833 | - | | 0.7506 | 900 | 0.1421 | - | | 0.7923 | 950 | 0.2345 | - | | 0.8340 | 1000 | 0.1347 | - | | 0.8757 | 1050 | 0.241 | - | | 0.9174 | 1100 | 0.133 | - | | 0.9591 | 1150 | 0.1041 | - | | **1.0** | **1199** | **-** | **0.3562** | | 1.0008 | 1200 | 0.0837 | - | | 1.0425 | 1250 | 0.1566 | - | | 1.0842 | 1300 | 0.2101 | - | | 1.1259 | 1350 | 0.0496 | - | | 1.1676 | 1400 | 0.063 | - | | 1.2093 | 1450 | 0.149 | - | | 1.2510 | 1500 | 0.038 | - | | 1.2927 | 1550 | 0.0504 | - | | 1.3344 | 1600 | 0.0679 | - | | 1.3761 | 1650 | 0.1699 | - | | 1.4178 | 1700 | 0.1293 | - | | 1.4595 | 1750 | 0.1083 | - | | 1.5013 | 1800 | 0.2044 | - | | 1.5430 | 1850 | 0.1267 | - | | 1.5847 | 1900 | 0.0842 | - | | 1.6264 | 1950 | 0.1126 | - | | 1.6681 | 2000 | 0.0544 | - | | 1.7098 | 2050 | 0.143 | - | | 1.7515 | 2100 | 0.08 | - | | 1.7932 | 2150 | 0.1103 | - | | 1.8349 | 2200 | 0.1768 | - | | 1.8766 | 2250 | 0.1639 | - | | 1.9183 | 2300 | 0.1637 | - | | 1.9600 | 2350 | 0.1637 | - | | 2.0 | 2398 | - | 0.3682 | | 2.0017 | 2400 | 0.2938 | - | | 2.0434 | 2450 | 0.0808 | - | | 2.0851 | 2500 | 0.0788 | - | | 2.1268 | 2550 | 0.2187 | - | | 2.1685 | 2600 | 0.0701 | - | | 2.2102 | 2650 | 0.0385 | - | | 2.2519 | 2700 | 0.135 | - | | 2.2936 | 2750 | 0.2276 | - | | 2.3353 | 2800 | 0.2203 | - | | 2.3770 | 2850 | 0.0029 | - | | 2.4187 | 2900 | 0.1855 | - | | 2.4604 | 2950 | 0.1278 | - | | 2.5021 | 3000 | 0.0487 | - | | 2.5438 | 3050 | 0.0404 | - | | 2.5855 | 3100 | 0.1158 | - | | 2.6272 | 3150 | 0.1354 | - | | 2.6689 | 3200 | 0.1633 | - | | 2.7106 | 3250 | 0.1484 | - | | 2.7523 | 3300 | 0.1146 | - | | 2.7940 | 3350 | 0.1437 | - | | 2.8357 | 3400 | 0.0948 | - | | 2.8774 | 3450 | 0.0833 | - | | 2.9191 | 3500 | 0.0668 | - | | 2.9608 | 3550 | 0.1687 | - | | 3.0 | 3597 | - | 0.3651 | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.12 - SetFit: 1.0.3 - Sentence Transformers: 2.6.1 - Transformers: 4.39.0 - PyTorch: 2.3.1+cu121 - Datasets: 2.18.0 - Tokenizers: 0.15.2 ## Citation ### BibTeX ```bibtex @article{https://doi.org/10.48550/arxiv.2209.11055, doi = {10.48550/ARXIV.2209.11055}, url = {https://arxiv.org/abs/2209.11055}, author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Efficient Few-Shot Learning Without Prompts}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } ```