tomaarsen HF staff commited on
Commit
f34c434
1 Parent(s): 6a62804

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,817 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: sentence-transformers
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: sentence-transformers/stsb-distilbert-base
11
+ metrics:
12
+ - cosine_accuracy
13
+ - cosine_accuracy_threshold
14
+ - cosine_f1
15
+ - cosine_f1_threshold
16
+ - cosine_precision
17
+ - cosine_recall
18
+ - cosine_ap
19
+ - dot_accuracy
20
+ - dot_accuracy_threshold
21
+ - dot_f1
22
+ - dot_f1_threshold
23
+ - dot_precision
24
+ - dot_recall
25
+ - dot_ap
26
+ - manhattan_accuracy
27
+ - manhattan_accuracy_threshold
28
+ - manhattan_f1
29
+ - manhattan_f1_threshold
30
+ - manhattan_precision
31
+ - manhattan_recall
32
+ - manhattan_ap
33
+ - euclidean_accuracy
34
+ - euclidean_accuracy_threshold
35
+ - euclidean_f1
36
+ - euclidean_f1_threshold
37
+ - euclidean_precision
38
+ - euclidean_recall
39
+ - euclidean_ap
40
+ - max_accuracy
41
+ - max_accuracy_threshold
42
+ - max_f1
43
+ - max_f1_threshold
44
+ - max_precision
45
+ - max_recall
46
+ - max_ap
47
+ - average_precision
48
+ - f1
49
+ - precision
50
+ - recall
51
+ - threshold
52
+ - cosine_accuracy@1
53
+ - cosine_accuracy@3
54
+ - cosine_accuracy@5
55
+ - cosine_accuracy@10
56
+ - cosine_precision@1
57
+ - cosine_precision@3
58
+ - cosine_precision@5
59
+ - cosine_precision@10
60
+ - cosine_recall@1
61
+ - cosine_recall@3
62
+ - cosine_recall@5
63
+ - cosine_recall@10
64
+ - cosine_ndcg@10
65
+ - cosine_mrr@10
66
+ - cosine_map@100
67
+ - dot_accuracy@1
68
+ - dot_accuracy@3
69
+ - dot_accuracy@5
70
+ - dot_accuracy@10
71
+ - dot_precision@1
72
+ - dot_precision@3
73
+ - dot_precision@5
74
+ - dot_precision@10
75
+ - dot_recall@1
76
+ - dot_recall@3
77
+ - dot_recall@5
78
+ - dot_recall@10
79
+ - dot_ndcg@10
80
+ - dot_mrr@10
81
+ - dot_map@100
82
+ widget:
83
+ - source_sentence: How metro works?
84
+ sentences:
85
+ - How can Turing machine works?
86
+ - What are the best C++ books?
87
+ - What should I learn first in PHP?
88
+ - source_sentence: How fast is fast?
89
+ sentences:
90
+ - How does light travel so fast?
91
+ - How could I become an actor?
92
+ - Was Muhammad a pedophile?
93
+ - source_sentence: What is a kernel?
94
+ sentences:
95
+ - What is a tensor?
96
+ - What does copyright protect?
97
+ - Can we increase height after 23?
98
+ - source_sentence: What is a tensor?
99
+ sentences:
100
+ - What is reliance jio?
101
+ - What are the reasons of war?
102
+ - Does speed reading really work?
103
+ - source_sentence: Is Cicret a scam?
104
+ sentences:
105
+ - Is the Cicret Bracelet a scam?
106
+ - Can you eat only once a day?
107
+ - What books should every man read?
108
+ pipeline_tag: sentence-similarity
109
+ co2_eq_emissions:
110
+ emissions: 15.153912802318576
111
+ energy_consumed: 0.038985939877640395
112
+ source: codecarbon
113
+ training_type: fine-tuning
114
+ on_cloud: false
115
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
116
+ ram_total_size: 31.777088165283203
117
+ hours_used: 0.169
118
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
119
+ model-index:
120
+ - name: SentenceTransformer based on sentence-transformers/stsb-distilbert-base
121
+ results:
122
+ - task:
123
+ type: binary-classification
124
+ name: Binary Classification
125
+ dataset:
126
+ name: quora duplicates
127
+ type: quora-duplicates
128
+ metrics:
129
+ - type: cosine_accuracy
130
+ value: 0.816
131
+ name: Cosine Accuracy
132
+ - type: cosine_accuracy_threshold
133
+ value: 0.7866689562797546
134
+ name: Cosine Accuracy Threshold
135
+ - type: cosine_f1
136
+ value: 0.7285714285714286
137
+ name: Cosine F1
138
+ - type: cosine_f1_threshold
139
+ value: 0.735264778137207
140
+ name: Cosine F1 Threshold
141
+ - type: cosine_precision
142
+ value: 0.6746031746031746
143
+ name: Cosine Precision
144
+ - type: cosine_recall
145
+ value: 0.7919254658385093
146
+ name: Cosine Recall
147
+ - type: cosine_ap
148
+ value: 0.7731120768804719
149
+ name: Cosine Ap
150
+ - type: dot_accuracy
151
+ value: 0.807
152
+ name: Dot Accuracy
153
+ - type: dot_accuracy_threshold
154
+ value: 150.97946166992188
155
+ name: Dot Accuracy Threshold
156
+ - type: dot_f1
157
+ value: 0.7223796033994335
158
+ name: Dot F1
159
+ - type: dot_f1_threshold
160
+ value: 137.3444366455078
161
+ name: Dot F1 Threshold
162
+ - type: dot_precision
163
+ value: 0.6640625
164
+ name: Dot Precision
165
+ - type: dot_recall
166
+ value: 0.7919254658385093
167
+ name: Dot Recall
168
+ - type: dot_ap
169
+ value: 0.749212069604305
170
+ name: Dot Ap
171
+ - type: manhattan_accuracy
172
+ value: 0.81
173
+ name: Manhattan Accuracy
174
+ - type: manhattan_accuracy_threshold
175
+ value: 195.88662719726562
176
+ name: Manhattan Accuracy Threshold
177
+ - type: manhattan_f1
178
+ value: 0.7246376811594203
179
+ name: Manhattan F1
180
+ - type: manhattan_f1_threshold
181
+ value: 237.68594360351562
182
+ name: Manhattan F1 Threshold
183
+ - type: manhattan_precision
184
+ value: 0.6292906178489702
185
+ name: Manhattan Precision
186
+ - type: manhattan_recall
187
+ value: 0.8540372670807453
188
+ name: Manhattan Recall
189
+ - type: manhattan_ap
190
+ value: 0.7610544151599187
191
+ name: Manhattan Ap
192
+ - type: euclidean_accuracy
193
+ value: 0.81
194
+ name: Euclidean Accuracy
195
+ - type: euclidean_accuracy_threshold
196
+ value: 8.773942947387695
197
+ name: Euclidean Accuracy Threshold
198
+ - type: euclidean_f1
199
+ value: 0.7260812581913498
200
+ name: Euclidean F1
201
+ - type: euclidean_f1_threshold
202
+ value: 10.843769073486328
203
+ name: Euclidean F1 Threshold
204
+ - type: euclidean_precision
205
+ value: 0.6281179138321995
206
+ name: Euclidean Precision
207
+ - type: euclidean_recall
208
+ value: 0.860248447204969
209
+ name: Euclidean Recall
210
+ - type: euclidean_ap
211
+ value: 0.7611533877712096
212
+ name: Euclidean Ap
213
+ - type: max_accuracy
214
+ value: 0.816
215
+ name: Max Accuracy
216
+ - type: max_accuracy_threshold
217
+ value: 195.88662719726562
218
+ name: Max Accuracy Threshold
219
+ - type: max_f1
220
+ value: 0.7285714285714286
221
+ name: Max F1
222
+ - type: max_f1_threshold
223
+ value: 237.68594360351562
224
+ name: Max F1 Threshold
225
+ - type: max_precision
226
+ value: 0.6746031746031746
227
+ name: Max Precision
228
+ - type: max_recall
229
+ value: 0.860248447204969
230
+ name: Max Recall
231
+ - type: max_ap
232
+ value: 0.7731120768804719
233
+ name: Max Ap
234
+ - task:
235
+ type: paraphrase-mining
236
+ name: Paraphrase Mining
237
+ dataset:
238
+ name: quora duplicates dev
239
+ type: quora-duplicates-dev
240
+ metrics:
241
+ - type: average_precision
242
+ value: 0.5348666252858723
243
+ name: Average Precision
244
+ - type: f1
245
+ value: 0.5395064090300363
246
+ name: F1
247
+ - type: precision
248
+ value: 0.5174549291251892
249
+ name: Precision
250
+ - type: recall
251
+ value: 0.5635210071439276
252
+ name: Recall
253
+ - type: threshold
254
+ value: 0.762035459280014
255
+ name: Threshold
256
+ - task:
257
+ type: information-retrieval
258
+ name: Information Retrieval
259
+ dataset:
260
+ name: Unknown
261
+ type: unknown
262
+ metrics:
263
+ - type: cosine_accuracy@1
264
+ value: 0.9646
265
+ name: Cosine Accuracy@1
266
+ - type: cosine_accuracy@3
267
+ value: 0.9926
268
+ name: Cosine Accuracy@3
269
+ - type: cosine_accuracy@5
270
+ value: 0.9956
271
+ name: Cosine Accuracy@5
272
+ - type: cosine_accuracy@10
273
+ value: 0.9986
274
+ name: Cosine Accuracy@10
275
+ - type: cosine_precision@1
276
+ value: 0.9646
277
+ name: Cosine Precision@1
278
+ - type: cosine_precision@3
279
+ value: 0.4293333333333333
280
+ name: Cosine Precision@3
281
+ - type: cosine_precision@5
282
+ value: 0.2754
283
+ name: Cosine Precision@5
284
+ - type: cosine_precision@10
285
+ value: 0.14515999999999998
286
+ name: Cosine Precision@10
287
+ - type: cosine_recall@1
288
+ value: 0.830104138622815
289
+ name: Cosine Recall@1
290
+ - type: cosine_recall@3
291
+ value: 0.9609072390452685
292
+ name: Cosine Recall@3
293
+ - type: cosine_recall@5
294
+ value: 0.9808022997296821
295
+ name: Cosine Recall@5
296
+ - type: cosine_recall@10
297
+ value: 0.9934541226453286
298
+ name: Cosine Recall@10
299
+ - type: cosine_ndcg@10
300
+ value: 0.9795490191788223
301
+ name: Cosine Ndcg@10
302
+ - type: cosine_mrr@10
303
+ value: 0.9789640476190478
304
+ name: Cosine Mrr@10
305
+ - type: cosine_map@100
306
+ value: 0.971751123151301
307
+ name: Cosine Map@100
308
+ - type: dot_accuracy@1
309
+ value: 0.9574
310
+ name: Dot Accuracy@1
311
+ - type: dot_accuracy@3
312
+ value: 0.9876
313
+ name: Dot Accuracy@3
314
+ - type: dot_accuracy@5
315
+ value: 0.9924
316
+ name: Dot Accuracy@5
317
+ - type: dot_accuracy@10
318
+ value: 0.9978
319
+ name: Dot Accuracy@10
320
+ - type: dot_precision@1
321
+ value: 0.9574
322
+ name: Dot Precision@1
323
+ - type: dot_precision@3
324
+ value: 0.4257333333333334
325
+ name: Dot Precision@3
326
+ - type: dot_precision@5
327
+ value: 0.27368000000000003
328
+ name: Dot Precision@5
329
+ - type: dot_precision@10
330
+ value: 0.14468000000000003
331
+ name: Dot Precision@10
332
+ - type: dot_recall@1
333
+ value: 0.8237692901379665
334
+ name: Dot Recall@1
335
+ - type: dot_recall@3
336
+ value: 0.9538191510221804
337
+ name: Dot Recall@3
338
+ - type: dot_recall@5
339
+ value: 0.9764249670623496
340
+ name: Dot Recall@5
341
+ - type: dot_recall@10
342
+ value: 0.9918117957075603
343
+ name: Dot Recall@10
344
+ - type: dot_ndcg@10
345
+ value: 0.9740754474178193
346
+ name: Dot Ndcg@10
347
+ - type: dot_mrr@10
348
+ value: 0.9731360317460321
349
+ name: Dot Mrr@10
350
+ - type: dot_map@100
351
+ value: 0.9646398037726347
352
+ name: Dot Map@100
353
+ ---
354
+
355
+ # SentenceTransformer based on sentence-transformers/stsb-distilbert-base
356
+
357
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/stsb-distilbert-base](https://huggingface.co/sentence-transformers/stsb-distilbert-base) on the [sentence-transformers/quora-duplicates](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
358
+
359
+ ## Model Details
360
+
361
+ ### Model Description
362
+ - **Model Type:** Sentence Transformer
363
+ - **Base model:** [sentence-transformers/stsb-distilbert-base](https://huggingface.co/sentence-transformers/stsb-distilbert-base) <!-- at revision 82ad392c08f81be9be9bf065339670b23f2e1493 -->
364
+ - **Maximum Sequence Length:** 128 tokens
365
+ - **Output Dimensionality:** 768 tokens
366
+ - **Similarity Function:** Cosine Similarity
367
+ - **Training Dataset:**
368
+ - [sentence-transformers/quora-duplicates](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
369
+ - **Language:** en
370
+ <!-- - **License:** Unknown -->
371
+
372
+ ### Model Sources
373
+
374
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
375
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
376
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
377
+
378
+ ### Full Model Architecture
379
+
380
+ ```
381
+ SentenceTransformer(
382
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
383
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
384
+ )
385
+ ```
386
+
387
+ ## Usage
388
+
389
+ ### Direct Usage (Sentence Transformers)
390
+
391
+ First install the Sentence Transformers library:
392
+
393
+ ```bash
394
+ pip install -U sentence-transformers
395
+ ```
396
+
397
+ Then you can load this model and run inference.
398
+ ```python
399
+ from sentence_transformers import SentenceTransformer
400
+
401
+ # Download from the 🤗 Hub
402
+ model = SentenceTransformer("tomaarsen/stsb-distilbert-base-mnrl")
403
+ # Run inference
404
+ sentences = [
405
+ 'Is Cicret a scam?',
406
+ 'Is the Cicret Bracelet a scam?',
407
+ 'Can you eat only once a day?',
408
+ ]
409
+ embeddings = model.encode(sentences)
410
+ print(embeddings.shape)
411
+ # [3, 768]
412
+
413
+ # Get the similarity scores for the embeddings
414
+ similarities = model.similarity(embeddings)
415
+ print(similarities.shape)
416
+ # [3, 3]
417
+ ```
418
+
419
+ <!--
420
+ ### Direct Usage (Transformers)
421
+
422
+ <details><summary>Click to see the direct usage in Transformers</summary>
423
+
424
+ </details>
425
+ -->
426
+
427
+ <!--
428
+ ### Downstream Usage (Sentence Transformers)
429
+
430
+ You can finetune this model on your own dataset.
431
+
432
+ <details><summary>Click to expand</summary>
433
+
434
+ </details>
435
+ -->
436
+
437
+ <!--
438
+ ### Out-of-Scope Use
439
+
440
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
441
+ -->
442
+
443
+ ## Evaluation
444
+
445
+ ### Metrics
446
+
447
+ #### Binary Classification
448
+ * Dataset: `quora-duplicates`
449
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
450
+
451
+ | Metric | Value |
452
+ |:-----------------------------|:-----------|
453
+ | cosine_accuracy | 0.816 |
454
+ | cosine_accuracy_threshold | 0.7867 |
455
+ | cosine_f1 | 0.7286 |
456
+ | cosine_f1_threshold | 0.7353 |
457
+ | cosine_precision | 0.6746 |
458
+ | cosine_recall | 0.7919 |
459
+ | cosine_ap | 0.7731 |
460
+ | dot_accuracy | 0.807 |
461
+ | dot_accuracy_threshold | 150.9795 |
462
+ | dot_f1 | 0.7224 |
463
+ | dot_f1_threshold | 137.3444 |
464
+ | dot_precision | 0.6641 |
465
+ | dot_recall | 0.7919 |
466
+ | dot_ap | 0.7492 |
467
+ | manhattan_accuracy | 0.81 |
468
+ | manhattan_accuracy_threshold | 195.8866 |
469
+ | manhattan_f1 | 0.7246 |
470
+ | manhattan_f1_threshold | 237.6859 |
471
+ | manhattan_precision | 0.6293 |
472
+ | manhattan_recall | 0.854 |
473
+ | manhattan_ap | 0.7611 |
474
+ | euclidean_accuracy | 0.81 |
475
+ | euclidean_accuracy_threshold | 8.7739 |
476
+ | euclidean_f1 | 0.7261 |
477
+ | euclidean_f1_threshold | 10.8438 |
478
+ | euclidean_precision | 0.6281 |
479
+ | euclidean_recall | 0.8602 |
480
+ | euclidean_ap | 0.7612 |
481
+ | max_accuracy | 0.816 |
482
+ | max_accuracy_threshold | 195.8866 |
483
+ | max_f1 | 0.7286 |
484
+ | max_f1_threshold | 237.6859 |
485
+ | max_precision | 0.6746 |
486
+ | max_recall | 0.8602 |
487
+ | **max_ap** | **0.7731** |
488
+
489
+ #### Paraphrase Mining
490
+ * Dataset: `quora-duplicates-dev`
491
+ * Evaluated with [<code>ParaphraseMiningEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.ParaphraseMiningEvaluator)
492
+
493
+ | Metric | Value |
494
+ |:----------------------|:-----------|
495
+ | **average_precision** | **0.5349** |
496
+ | f1 | 0.5395 |
497
+ | precision | 0.5175 |
498
+ | recall | 0.5635 |
499
+ | threshold | 0.762 |
500
+
501
+ #### Information Retrieval
502
+
503
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
504
+
505
+ | Metric | Value |
506
+ |:--------------------|:-----------|
507
+ | cosine_accuracy@1 | 0.9646 |
508
+ | cosine_accuracy@3 | 0.9926 |
509
+ | cosine_accuracy@5 | 0.9956 |
510
+ | cosine_accuracy@10 | 0.9986 |
511
+ | cosine_precision@1 | 0.9646 |
512
+ | cosine_precision@3 | 0.4293 |
513
+ | cosine_precision@5 | 0.2754 |
514
+ | cosine_precision@10 | 0.1452 |
515
+ | cosine_recall@1 | 0.8301 |
516
+ | cosine_recall@3 | 0.9609 |
517
+ | cosine_recall@5 | 0.9808 |
518
+ | cosine_recall@10 | 0.9935 |
519
+ | cosine_ndcg@10 | 0.9795 |
520
+ | cosine_mrr@10 | 0.979 |
521
+ | **cosine_map@100** | **0.9718** |
522
+ | dot_accuracy@1 | 0.9574 |
523
+ | dot_accuracy@3 | 0.9876 |
524
+ | dot_accuracy@5 | 0.9924 |
525
+ | dot_accuracy@10 | 0.9978 |
526
+ | dot_precision@1 | 0.9574 |
527
+ | dot_precision@3 | 0.4257 |
528
+ | dot_precision@5 | 0.2737 |
529
+ | dot_precision@10 | 0.1447 |
530
+ | dot_recall@1 | 0.8238 |
531
+ | dot_recall@3 | 0.9538 |
532
+ | dot_recall@5 | 0.9764 |
533
+ | dot_recall@10 | 0.9918 |
534
+ | dot_ndcg@10 | 0.9741 |
535
+ | dot_mrr@10 | 0.9731 |
536
+ | dot_map@100 | 0.9646 |
537
+
538
+ <!--
539
+ ## Bias, Risks and Limitations
540
+
541
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
542
+ -->
543
+
544
+ <!--
545
+ ### Recommendations
546
+
547
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
548
+ -->
549
+
550
+ ## Training Details
551
+
552
+ ### Training Dataset
553
+
554
+ #### sentence-transformers/quora-duplicates
555
+
556
+ * Dataset: [sentence-transformers/quora-duplicates](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
557
+ * Size: 100,000 training samples
558
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
559
+ * Approximate statistics based on the first 1000 samples:
560
+ | | anchor | positive | negative |
561
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
562
+ | type | string | string | string |
563
+ | details | <ul><li>min: 6 tokens</li><li>mean: 13.85 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 13.65 tokens</li><li>max: 44 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.76 tokens</li><li>max: 64 tokens</li></ul> |
564
+ * Samples:
565
+ | anchor | positive | negative |
566
+ |:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------|
567
+ | <code>Why in India do we not have one on one political debate as in USA?</code> | <code>Why cant we have a public debate between politicians in India like the one in US?</code> | <code>Can people on Quora stop India Pakistan debate? We are sick and tired seeing this everyday in bulk?</code> |
568
+ | <code>What is OnePlus One?</code> | <code>How is oneplus one?</code> | <code>Why is OnePlus One so good?</code> |
569
+ | <code>Does our mind control our emotions?</code> | <code>How do smart and successful people control their emotions?</code> | <code>How can I control my positive emotions for the people whom I love but they don't care about me?</code> |
570
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) with these parameters:
571
+ ```json
572
+ {
573
+ "scale": 20.0,
574
+ "similarity_fct": "cos_sim"
575
+ }
576
+ ```
577
+
578
+ ### Evaluation Dataset
579
+
580
+ #### sentence-transformers/quora-duplicates
581
+
582
+ * Dataset: [sentence-transformers/quora-duplicates](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
583
+ * Size: 1,000 evaluation samples
584
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
585
+ * Approximate statistics based on the first 1000 samples:
586
+ | | anchor | positive | negative |
587
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
588
+ | type | string | string | string |
589
+ | details | <ul><li>min: 7 tokens</li><li>mean: 13.84 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 13.8 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.71 tokens</li><li>max: 56 tokens</li></ul> |
590
+ * Samples:
591
+ | anchor | positive | negative |
592
+ |:---------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
593
+ | <code>Which programming language is best for developing low-end games?</code> | <code>What coding language should I learn first for making games?</code> | <code>I am entering the world of video game programming and want to know what language I should learn? Because there are so many languages ​​I do not know which one to start with. Can you recommend a language that's easy to learn and can be used with many platforms?</code> |
594
+ | <code>Was it appropriate for Meryl Streep to use her Golden Globes speech to attack Donald Trump?</code> | <code>Should Meryl Streep be using her position to attack the president?</code> | <code>Why did Kelly Ann Conway say that Meryl Streep incited peoples worst feelings?</code> |
595
+ | <code>Where can I found excellent commercial fridges in Sydney?</code> | <code>Where can I found impressive range of commercial fridges in Sydney?</code> | <code>What is the best grocery delivery service in Sydney?</code> |
596
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) with these parameters:
597
+ ```json
598
+ {
599
+ "scale": 20.0,
600
+ "similarity_fct": "cos_sim"
601
+ }
602
+ ```
603
+
604
+ ### Training Hyperparameters
605
+ #### Non-Default Hyperparameters
606
+
607
+ - `eval_strategy`: steps
608
+ - `per_device_train_batch_size`: 64
609
+ - `per_device_eval_batch_size`: 64
610
+ - `num_train_epochs`: 1
611
+ - `warmup_ratio`: 0.1
612
+ - `fp16`: True
613
+ - `batch_sampler`: no_duplicates
614
+
615
+ #### All Hyperparameters
616
+ <details><summary>Click to expand</summary>
617
+
618
+ - `overwrite_output_dir`: False
619
+ - `do_predict`: False
620
+ - `eval_strategy`: steps
621
+ - `prediction_loss_only`: False
622
+ - `per_device_train_batch_size`: 64
623
+ - `per_device_eval_batch_size`: 64
624
+ - `per_gpu_train_batch_size`: None
625
+ - `per_gpu_eval_batch_size`: None
626
+ - `gradient_accumulation_steps`: 1
627
+ - `eval_accumulation_steps`: None
628
+ - `learning_rate`: 5e-05
629
+ - `weight_decay`: 0.0
630
+ - `adam_beta1`: 0.9
631
+ - `adam_beta2`: 0.999
632
+ - `adam_epsilon`: 1e-08
633
+ - `max_grad_norm`: 1.0
634
+ - `num_train_epochs`: 1
635
+ - `max_steps`: -1
636
+ - `lr_scheduler_type`: linear
637
+ - `lr_scheduler_kwargs`: {}
638
+ - `warmup_ratio`: 0.1
639
+ - `warmup_steps`: 0
640
+ - `log_level`: passive
641
+ - `log_level_replica`: warning
642
+ - `log_on_each_node`: True
643
+ - `logging_nan_inf_filter`: True
644
+ - `save_safetensors`: True
645
+ - `save_on_each_node`: False
646
+ - `save_only_model`: False
647
+ - `no_cuda`: False
648
+ - `use_cpu`: False
649
+ - `use_mps_device`: False
650
+ - `seed`: 42
651
+ - `data_seed`: None
652
+ - `jit_mode_eval`: False
653
+ - `use_ipex`: False
654
+ - `bf16`: False
655
+ - `fp16`: True
656
+ - `fp16_opt_level`: O1
657
+ - `half_precision_backend`: auto
658
+ - `bf16_full_eval`: False
659
+ - `fp16_full_eval`: False
660
+ - `tf32`: None
661
+ - `local_rank`: 0
662
+ - `ddp_backend`: None
663
+ - `tpu_num_cores`: None
664
+ - `tpu_metrics_debug`: False
665
+ - `debug`: []
666
+ - `dataloader_drop_last`: False
667
+ - `dataloader_num_workers`: 0
668
+ - `dataloader_prefetch_factor`: None
669
+ - `past_index`: -1
670
+ - `disable_tqdm`: False
671
+ - `remove_unused_columns`: True
672
+ - `label_names`: None
673
+ - `load_best_model_at_end`: False
674
+ - `ignore_data_skip`: False
675
+ - `fsdp`: []
676
+ - `fsdp_min_num_params`: 0
677
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
678
+ - `fsdp_transformer_layer_cls_to_wrap`: None
679
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
680
+ - `deepspeed`: None
681
+ - `label_smoothing_factor`: 0.0
682
+ - `optim`: adamw_torch
683
+ - `optim_args`: None
684
+ - `adafactor`: False
685
+ - `group_by_length`: False
686
+ - `length_column_name`: length
687
+ - `ddp_find_unused_parameters`: None
688
+ - `ddp_bucket_cap_mb`: None
689
+ - `ddp_broadcast_buffers`: None
690
+ - `dataloader_pin_memory`: True
691
+ - `dataloader_persistent_workers`: False
692
+ - `skip_memory_metrics`: True
693
+ - `use_legacy_prediction_loop`: False
694
+ - `push_to_hub`: False
695
+ - `resume_from_checkpoint`: None
696
+ - `hub_model_id`: None
697
+ - `hub_strategy`: every_save
698
+ - `hub_private_repo`: False
699
+ - `hub_always_push`: False
700
+ - `gradient_checkpointing`: False
701
+ - `gradient_checkpointing_kwargs`: None
702
+ - `include_inputs_for_metrics`: False
703
+ - `eval_do_concat_batches`: True
704
+ - `fp16_backend`: auto
705
+ - `push_to_hub_model_id`: None
706
+ - `push_to_hub_organization`: None
707
+ - `mp_parameters`:
708
+ - `auto_find_batch_size`: False
709
+ - `full_determinism`: False
710
+ - `torchdynamo`: None
711
+ - `ray_scope`: last
712
+ - `ddp_timeout`: 1800
713
+ - `torch_compile`: False
714
+ - `torch_compile_backend`: None
715
+ - `torch_compile_mode`: None
716
+ - `dispatch_batches`: None
717
+ - `split_batches`: None
718
+ - `include_tokens_per_second`: False
719
+ - `include_num_input_tokens_seen`: False
720
+ - `neftune_noise_alpha`: None
721
+ - `optim_target_modules`: None
722
+ - `batch_sampler`: no_duplicates
723
+ - `multi_dataset_batch_sampler`: proportional
724
+
725
+ </details>
726
+
727
+ ### Training Logs
728
+ | Epoch | Step | Training Loss | loss | cosine_map@100 | quora-duplicates-dev_average_precision | quora-duplicates_max_ap |
729
+ |:------:|:----:|:-------------:|:------:|:--------------:|:--------------------------------------:|:-----------------------:|
730
+ | 0 | 0 | - | - | 0.9245 | 0.4200 | 0.6890 |
731
+ | 0.0640 | 100 | 0.2535 | - | - | - | - |
732
+ | 0.1280 | 200 | 0.1732 | - | - | - | - |
733
+ | 0.1599 | 250 | - | 0.1021 | 0.9601 | 0.5033 | 0.7342 |
734
+ | 0.1919 | 300 | 0.1465 | - | - | - | - |
735
+ | 0.2559 | 400 | 0.1186 | - | - | - | - |
736
+ | 0.3199 | 500 | 0.1159 | 0.0773 | 0.9653 | 0.5247 | 0.7453 |
737
+ | 0.3839 | 600 | 0.1088 | - | - | - | - |
738
+ | 0.4479 | 700 | 0.0993 | - | - | - | - |
739
+ | 0.4798 | 750 | - | 0.0665 | 0.9666 | 0.5264 | 0.7655 |
740
+ | 0.5118 | 800 | 0.0952 | - | - | - | - |
741
+ | 0.5758 | 900 | 0.0799 | - | - | - | - |
742
+ | 0.6398 | 1000 | 0.0855 | 0.0570 | 0.9709 | 0.5391 | 0.7717 |
743
+ | 0.7038 | 1100 | 0.0804 | - | - | - | - |
744
+ | 0.7678 | 1200 | 0.073 | - | - | - | - |
745
+ | 0.7997 | 1250 | - | 0.0513 | 0.9719 | 0.5329 | 0.7662 |
746
+ | 0.8317 | 1300 | 0.0741 | - | - | - | - |
747
+ | 0.8957 | 1400 | 0.0699 | - | - | - | - |
748
+ | 0.9597 | 1500 | 0.0755 | 0.0476 | 0.9718 | 0.5349 | 0.7731 |
749
+
750
+
751
+ ### Environmental Impact
752
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
753
+ - **Energy Consumed**: 0.039 kWh
754
+ - **Carbon Emitted**: 0.015 kg of CO2
755
+ - **Hours Used**: 0.169 hours
756
+
757
+ ### Training Hardware
758
+ - **On Cloud**: No
759
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
760
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
761
+ - **RAM Size**: 31.78 GB
762
+
763
+ ### Framework Versions
764
+ - Python: 3.11.6
765
+ - Sentence Transformers: 3.0.0.dev0
766
+ - Transformers: 4.41.0.dev0
767
+ - PyTorch: 2.3.0+cu121
768
+ - Accelerate: 0.26.1
769
+ - Datasets: 2.18.0
770
+ - Tokenizers: 0.19.1
771
+
772
+ ## Citation
773
+
774
+ ### BibTeX
775
+
776
+ #### Sentence Transformers
777
+ ```bibtex
778
+ @inproceedings{reimers-2019-sentence-bert,
779
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
780
+ author = "Reimers, Nils and Gurevych, Iryna",
781
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
782
+ month = "11",
783
+ year = "2019",
784
+ publisher = "Association for Computational Linguistics",
785
+ url = "https://arxiv.org/abs/1908.10084",
786
+ }
787
+ ```
788
+
789
+ #### MultipleNegativesRankingLoss
790
+ ```bibtex
791
+ @misc{henderson2017efficient,
792
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
793
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
794
+ year={2017},
795
+ eprint={1705.00652},
796
+ archivePrefix={arXiv},
797
+ primaryClass={cs.CL}
798
+ }
799
+ ```
800
+
801
+ <!--
802
+ ## Glossary
803
+
804
+ *Clearly define terms in order to be accessible across audiences.*
805
+ -->
806
+
807
+ <!--
808
+ ## Model Card Authors
809
+
810
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
811
+ -->
812
+
813
+ <!--
814
+ ## Model Card Contact
815
+
816
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
817
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/stsb-distilbert-base",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertModel"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "qa_dropout": 0.1,
18
+ "seq_classif_dropout": 0.2,
19
+ "sinusoidal_pos_embds": false,
20
+ "tie_weights_": true,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.41.0.dev0",
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.7.0",
5
+ "pytorch": "1.9.0+cu102"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b076910accfb5da8b27f844d88655b7ce78dc0429622dc19f1aae801e18346e
3
+ size 265462608
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "full_tokenizer_file": null,
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 128,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff