Jalajkx commited on
Commit
5788b68
1 Parent(s): c1079c2

Add SetFit model

Browse files
README.md CHANGED
@@ -1,49 +1,235 @@
1
  ---
2
- license: apache-2.0
3
  tags:
4
  - setfit
5
  - sentence-transformers
6
  - text-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- # Jalajkx/all_mpnetcric-setfit-model
11
 
12
- This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:
 
 
13
 
14
  1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
15
  2. Training a classification head with features from the fine-tuned Sentence Transformer.
16
 
17
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- To use this model for inference, first install the SetFit library:
 
 
20
 
21
  ```bash
22
- python -m pip install setfit
23
  ```
24
 
25
- You can then run inference as follows:
26
 
27
  ```python
28
  from setfit import SetFitModel
29
 
30
- # Download from Hub and run inference
31
  model = SetFitModel.from_pretrained("Jalajkx/all_mpnetcric-setfit-model")
32
  # Run inference
33
- preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
34
  ```
35
 
36
- ## BibTeX entry and citation info
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ```bibtex
39
  @article{https://doi.org/10.48550/arxiv.2209.11055,
40
- doi = {10.48550/ARXIV.2209.11055},
41
- url = {https://arxiv.org/abs/2209.11055},
42
- author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
43
- keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
44
- title = {Efficient Few-Shot Learning Without Prompts},
45
- publisher = {arXiv},
46
- year = {2022},
47
- copyright = {Creative Commons Attribution 4.0 International}
48
  }
49
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: setfit
3
  tags:
4
  - setfit
5
  - sentence-transformers
6
  - text-classification
7
+ - generated_from_setfit_trainer
8
+ metrics:
9
+ - accuracy
10
+ widget:
11
+ - text: okay so just inform them to re submit the order okay because that is the information
12
+ that i see right now for the cellphone phone number aah ending again one four
13
+ seven one there is the port protection so provide them again the aah aah aah port
14
+ out number to port ahead to call this one the transfer pin and they will have
15
+ to aah started over and reset make the order and i usually that should really
16
+ work
17
+ - text: hi welcome to cricket nation thanks for your patience this call may i have
18
+ your name
19
+ - text: okay okay okay let me just go ahead and wait for the car all right i have
20
+ now pulled up of the account and as i can see here the account has uh one line
21
+ under the sixty dollar plan and the due date is every the fourth of them i let
22
+ me just review the account as well as the payments on the accounts okay can i
23
+ just uh put the call on hold for just a minute or two
24
+ - text: cleveland but it could be done in the morning or tending that you've name
25
+ but it is within today okay so that's a the status for the other number which
26
+ is the one ending in one four seven one
27
+ - text: ' that yes you did receive two text messages from cricket letting you know
28
+ that your payment is due by the nine which i''m i''m not sure how exactly that
29
+ was sent to you however prize'
30
  pipeline_tag: text-classification
31
+ inference: true
32
+ base_model: sentence-transformers/all-mpnet-base-v2
33
+ model-index:
34
+ - name: SetFit with sentence-transformers/all-mpnet-base-v2
35
+ results:
36
+ - task:
37
+ type: text-classification
38
+ name: Text Classification
39
+ dataset:
40
+ name: Unknown
41
+ type: unknown
42
+ split: test
43
+ metrics:
44
+ - type: accuracy
45
+ value: 0.8571428571428571
46
+ name: Accuracy
47
  ---
48
 
49
+ # SetFit with sentence-transformers/all-mpnet-base-v2
50
 
51
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
52
+
53
+ The model has been trained using an efficient few-shot learning technique that involves:
54
 
55
  1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
56
  2. Training a classification head with features from the fine-tuned Sentence Transformer.
57
 
58
+ ## Model Details
59
+
60
+ ### Model Description
61
+ - **Model Type:** SetFit
62
+ - **Sentence Transformer body:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
63
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
64
+ - **Maximum Sequence Length:** 384 tokens
65
+ - **Number of Classes:** 7 classes
66
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
67
+ <!-- - **Language:** Unknown -->
68
+ <!-- - **License:** Unknown -->
69
+
70
+ ### Model Sources
71
+
72
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
73
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
74
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
75
+
76
+ ### Model Labels
77
+ | Label | Examples |
78
+ |:------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
79
+ | 0 | <ul><li>"right okay aah i do see here that you got two other numbers two other lines one is ending in zero one eight six and the last one is ending in three two six six i've got here that the bogo is four hundred and thirteen dollars and"</li><li>"gotcha i'm really sure that this is cause a lot of trouble just then let me just pull up your account here and see what's going on so obviously you calling about the number ending in six one two three"</li><li>"okay so i will check the aah status of the aah aah port request okay let me check here okay so for the um the phone number ending in one one zero zero okay i'll check the status for that so it says here that um the new you service provider is all we are waiting for the aah new service provider to activate so uh they will just need to activate that i believe there is a due date of that would be around today okay maybe aah ten in the morning it did not indicate here"</li></ul> |
80
+ | 4 | <ul><li>"okay so i will check the aah status of the aah aah port request okay let me check here okay so for the um the phone number ending in one one zero zero okay i'll check the status for that so it says here that um the new you service provider is all we are waiting for the aah new service provider to activate so uh they will just need to activate that i believe there is a due date of that would be around today okay maybe aah ten in the morning it did not indicate here"</li><li>"alright got your account who you change that you have uh one line only as per checking here i know that you just paid your bill it's it's because today is your due date"</li><li>"okay aah i got a text aah actually right i actually saw it right before the text it says it my payment or my bill was due okay at midnight at i solve it but and then not on aah on the second or the last my son my phones not working right now as a standard but then i got a text right after that aah on the same aah you know text from cricket and it says that my own read it to you this is what they're me off this is why didn't pay it because aah actually it's kind of a for the aah this is another my phone went out that's another thing i need to ask you about get another phone i i do pay insurance a phone in screen went out anyway on that no it says yeah you okay can i get the payment it says the current balance is zero dollars your next bill is due on twelve nine twenty two sir i thought i had to the ninth i m s is clear they actually take me that's why aah know your and so i didn't think i had to pay and so i didn't think i had to pay it cut me off and now aah that's where i'm at and i was trying to pay it with my card i have on file the service something's wrong i don't know what's goin' on"</li></ul> |
81
+ | 6 | <ul><li>"did you have the agency apply to your account oh as far as in your bill monthly aah you know your new sixty dollar plan i'm sorry you wanted be on the thirty dollar plan okay so on the sixty dollar and then you only paid thirty on the the bill so thirty five is matching here the remaining balance on the account and the data turned back on on aah would it be possible for us to put her on the thirty dollar rate plan n"</li><li>"oh what do you need to pay okay right now to upgrade to the sixty dollar is gonna be a prorated charge which is let me try to check um hold on oh it's just seventeen cents i can take care of that for you no need to pay today so i'll i'll take care of seventeen cents that is so a prorated amount it will upgrade your plan to sixty dollar and you will have a uh you will get a free feature of fifteen gigabyte hot spot you can enjoy today and then aah i can change it back future dated"</li><li>"all right i'm still updating your plan just give me one moment all right so you see when i upgrade your plan there is going to be a remaining balance of five dollars here so i just need to change it back to fifty five p peter dated at the end of your cycle which is the my so you will not be charged the five dollars"</li></ul> |
82
+ | 5 | <ul><li>"my name is alexandria and the customer's name mr sandra"</li><li>"thanks for calling cricket my name is jay may have your name there's calling cricket my name is jay i have your name mm hmm"</li><li>"your mother welcome um so if we can't be free go please return your call us you david gonna transfer pending let me do that so do you have any other concerns in other questions thank you for your time once again my name's jay and thank you for"</li></ul> |
83
+ | 1 | <ul><li>'my number to verizon because your services just completely black'</li><li>"forty hour reagan oh my my phone number i'm bye"</li><li>"alright thank you so much richard so yes i can see you're calling in today regarding the same thing i'm so sorry and like your phone number so would it be for out actually right now that was like a port process going on"</li></ul> |
84
+ | 2 | <ul><li>"okay aah i got a text aah actually right i actually saw it right before the text it says it my payment or my bill was due okay at midnight at i solve it but and then not on aah on the second or the last my son my phones not working right now as a standard but then i got a text right after that aah on the same aah you know text from cricket and it says that my own read it to you this is what they're me off this is why didn't pay it because aah actually it's kind of a for the aah this is another my phone went out that's another thing i need to ask you about get another phone i i do pay insurance a phone in screen went out anyway on that no it says yeah you okay can i get the payment it says the current balance is zero dollars your next bill is due on twelve nine twenty two sir i thought i had to the ninth i m s is clear they actually take me that's why aah know your and so i didn't think i had to pay and so i didn't think i had to pay it cut me off and now aah that's where i'm at and i was trying to pay it with my card i have on file the service something's wrong i don't know what's goin' on"</li><li>"just said i i saw that message after so this one when it off but you know i mean like i said is going to be using why it shouldn't be the other one aah saying that aah there was due on the ninth or said my balance is zero dollars and that my next payment is due on the ninth of the of the four two and aah if this year the also said of ninety eight dollars u s usually my david you'll be so i understand why why then they sent me a"</li><li>"you're saying that your bill is not usually ninety eight dollars"</li></ul> |
85
+ | 3 | <ul><li>"you're saying that your bill is not usually ninety eight dollars"</li><li>"oh boy the extra effort of calling cricket each month or going into a store to pay your bill just ask one to help you in row today auto pay another great way to save with cricket have you received the suspicious text don't before by spam helps done spam texts my forwarding the message the code seven seven which belt spam went in don't don't click that link did you know that using the my cricket app can save you time and make life easier the mycricket app is a fast and secure way to pay your bill in manage your account okay wendy"</li><li>"already no invalid series streaming now an h b o max when you're on our sixty dollar and the minutes plan you can get h b o max with ads included on i three new shows and what is your movies any time anywhere you want after i presented about our sixty dollar unlimited plan today in a case in the next after i pretty town oh oh here the highly into submitted h b o a regional series house of the draft and would you tell the story amount storage area and said two hundred years before the event of game of thrones is matt an h b o max my end you know the trend is will this you on our sixty dollar unlimited plan you can get h b o max with ads included on on a stream actually originally blockbuster movies a new shows anytime anywhere you want that's the representative better sixty dollar unlimited plan today limitations an exclusion supply ask the recording tells oh oh oh did you know that you can save as much is sixty dollars a year you when you enrolled in autopay customers are most popular single line plans receive five dollars off your monthly my bill with autopay you can use a debit or credit card or reloadable prepaid card and with autopay it's easy to update your form a payment it anytime avoid the extra effort of calling cricket each month or going into a store to pay your bill just ask one of our agents to help you in row today autopay on other great way to save with cricket oh oh have you received a suspicion text don't before by spam helps done spam texts my forwarding the message the code seven seven two six which spells all right thank you so much for your patience just in"</li></ul> |
86
+
87
+ ## Evaluation
88
+
89
+ ### Metrics
90
+ | Label | Accuracy |
91
+ |:--------|:---------|
92
+ | **all** | 0.8571 |
93
+
94
+ ## Uses
95
 
96
+ ### Direct Use for Inference
97
+
98
+ First install the SetFit library:
99
 
100
  ```bash
101
+ pip install setfit
102
  ```
103
 
104
+ Then you can load this model and run inference.
105
 
106
  ```python
107
  from setfit import SetFitModel
108
 
109
+ # Download from the 🤗 Hub
110
  model = SetFitModel.from_pretrained("Jalajkx/all_mpnetcric-setfit-model")
111
  # Run inference
112
+ preds = model("hi welcome to cricket nation thanks for your patience this call may i have your name")
113
  ```
114
 
115
+ <!--
116
+ ### Downstream Use
117
+
118
+ *List how someone could finetune this model on their own dataset.*
119
+ -->
120
+
121
+ <!--
122
+ ### Out-of-Scope Use
123
+
124
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
125
+ -->
126
+
127
+ <!--
128
+ ## Bias, Risks and Limitations
129
+
130
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
131
+ -->
132
+
133
+ <!--
134
+ ### Recommendations
135
 
136
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
137
+ -->
138
+
139
+ ## Training Details
140
+
141
+ ### Training Set Metrics
142
+ | Training set | Min | Median | Max |
143
+ |:-------------|:----|:--------|:----|
144
+ | Word count | 5 | 64.6774 | 312 |
145
+
146
+ | Label | Training Sample Count |
147
+ |:------|:----------------------|
148
+ | 0 | 7 |
149
+ | 1 | 12 |
150
+ | 2 | 3 |
151
+ | 3 | 4 |
152
+ | 4 | 19 |
153
+ | 5 | 13 |
154
+ | 6 | 4 |
155
+
156
+ ### Training Hyperparameters
157
+ - batch_size: (4, 4)
158
+ - num_epochs: (1, 1)
159
+ - max_steps: -1
160
+ - sampling_strategy: oversampling
161
+ - num_iterations: 25
162
+ - body_learning_rate: (2e-05, 2e-05)
163
+ - head_learning_rate: 2e-05
164
+ - loss: CosineSimilarityLoss
165
+ - distance_metric: cosine_distance
166
+ - margin: 0.25
167
+ - end_to_end: False
168
+ - use_amp: False
169
+ - warmup_proportion: 0.1
170
+ - seed: 42
171
+ - eval_max_steps: -1
172
+ - load_best_model_at_end: False
173
+
174
+ ### Training Results
175
+ | Epoch | Step | Training Loss | Validation Loss |
176
+ |:------:|:----:|:-------------:|:---------------:|
177
+ | 0.0013 | 1 | 0.2177 | - |
178
+ | 0.0645 | 50 | 0.092 | - |
179
+ | 0.1290 | 100 | 0.0167 | - |
180
+ | 0.1935 | 150 | 0.1883 | - |
181
+ | 0.2581 | 200 | 0.3168 | - |
182
+ | 0.3226 | 250 | 0.0372 | - |
183
+ | 0.3871 | 300 | 0.0253 | - |
184
+ | 0.4516 | 350 | 0.2565 | - |
185
+ | 0.5161 | 400 | 0.0096 | - |
186
+ | 0.5806 | 450 | 0.0957 | - |
187
+ | 0.6452 | 500 | 0.001 | - |
188
+ | 0.7097 | 550 | 0.0021 | - |
189
+ | 0.7742 | 600 | 0.2043 | - |
190
+ | 0.8387 | 650 | 0.0042 | - |
191
+ | 0.9032 | 700 | 0.001 | - |
192
+ | 0.9677 | 750 | 0.0788 | - |
193
+
194
+ ### Framework Versions
195
+ - Python: 3.10.13
196
+ - SetFit: 1.0.1
197
+ - Sentence Transformers: 2.2.2
198
+ - Transformers: 4.36.1
199
+ - PyTorch: 2.0.1
200
+ - Datasets: 2.15.0
201
+ - Tokenizers: 0.15.0
202
+
203
+ ## Citation
204
+
205
+ ### BibTeX
206
  ```bibtex
207
  @article{https://doi.org/10.48550/arxiv.2209.11055,
208
+ doi = {10.48550/ARXIV.2209.11055},
209
+ url = {https://arxiv.org/abs/2209.11055},
210
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
211
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
212
+ title = {Efficient Few-Shot Learning Without Prompts},
213
+ publisher = {arXiv},
214
+ year = {2022},
215
+ copyright = {Creative Commons Attribution 4.0 International}
216
  }
217
  ```
218
+
219
+ <!--
220
+ ## Glossary
221
+
222
+ *Clearly define terms in order to be accessible across audiences.*
223
+ -->
224
+
225
+ <!--
226
+ ## Model Card Authors
227
+
228
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
229
+ -->
230
+
231
+ <!--
232
+ ## Model Card Contact
233
+
234
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
235
+ -->
config.json CHANGED
@@ -19,6 +19,6 @@
19
  "pad_token_id": 1,
20
  "relative_attention_num_buckets": 32,
21
  "torch_dtype": "float32",
22
- "transformers_version": "4.35.2",
23
  "vocab_size": 30527
24
  }
 
19
  "pad_token_id": 1,
20
  "relative_attention_num_buckets": 32,
21
  "torch_dtype": "float32",
22
+ "transformers_version": "4.36.1",
23
  "vocab_size": 30527
24
  }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c83a865741ed2861069a65f1850d5155c222ba2bef7e6457a7fbbeea9cfdcf13
3
  size 437967672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08ac66624c66fff695712bee8d44ac4979be67d57bfcabbddf9dcb2cfb9fb9b1
3
  size 437967672
model_head.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:02f4ef2796fb174f468d9e77c295cf814451297ef3a27f756cd52e9e20d7dddf
3
  size 43967
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bc93e161951d7e18999402fec055a057864ad00dbe224562b4d1273f4bbef6c
3
  size 43967
special_tokens_map.json CHANGED
@@ -9,7 +9,7 @@
9
  "cls_token": {
10
  "content": "<s>",
11
  "lstrip": false,
12
- "normalized": true,
13
  "rstrip": false,
14
  "single_word": false
15
  },
@@ -37,7 +37,7 @@
37
  "sep_token": {
38
  "content": "</s>",
39
  "lstrip": false,
40
- "normalized": true,
41
  "rstrip": false,
42
  "single_word": false
43
  },
 
9
  "cls_token": {
10
  "content": "<s>",
11
  "lstrip": false,
12
+ "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
 
37
  "sep_token": {
38
  "content": "</s>",
39
  "lstrip": false,
40
+ "normalized": false,
41
  "rstrip": false,
42
  "single_word": false
43
  },