bobox commited on
Commit
f2432b0
·
verified ·
1 Parent(s): f1121cf

Training in progress, step 80, checkpoint

Browse files
checkpoint-80/1_AdvancedWeightedPooling/config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 768,
3
+ "num_heads": 8,
4
+ "dropout": 0.0,
5
+ "bias": true,
6
+ "gate_min": 0.2,
7
+ "gate_max": 0.8
8
+ }
checkpoint-80/1_AdvancedWeightedPooling/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b3bf7f68b54022b657064f0d11be90d3cad99199570681a5d8da186525370b2
3
+ size 11828367
checkpoint-80/README.md ADDED
@@ -0,0 +1,899 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/deberta-v3-small
3
+ datasets:
4
+ - tals/vitaminc
5
+ language:
6
+ - en
7
+ library_name: sentence-transformers
8
+ metrics:
9
+ - pearson_cosine
10
+ - spearman_cosine
11
+ - pearson_manhattan
12
+ - spearman_manhattan
13
+ - pearson_euclidean
14
+ - spearman_euclidean
15
+ - pearson_dot
16
+ - spearman_dot
17
+ - pearson_max
18
+ - spearman_max
19
+ - cosine_accuracy
20
+ - cosine_accuracy_threshold
21
+ - cosine_f1
22
+ - cosine_f1_threshold
23
+ - cosine_precision
24
+ - cosine_recall
25
+ - cosine_ap
26
+ - dot_accuracy
27
+ - dot_accuracy_threshold
28
+ - dot_f1
29
+ - dot_f1_threshold
30
+ - dot_precision
31
+ - dot_recall
32
+ - dot_ap
33
+ - manhattan_accuracy
34
+ - manhattan_accuracy_threshold
35
+ - manhattan_f1
36
+ - manhattan_f1_threshold
37
+ - manhattan_precision
38
+ - manhattan_recall
39
+ - manhattan_ap
40
+ - euclidean_accuracy
41
+ - euclidean_accuracy_threshold
42
+ - euclidean_f1
43
+ - euclidean_f1_threshold
44
+ - euclidean_precision
45
+ - euclidean_recall
46
+ - euclidean_ap
47
+ - max_accuracy
48
+ - max_accuracy_threshold
49
+ - max_f1
50
+ - max_f1_threshold
51
+ - max_precision
52
+ - max_recall
53
+ - max_ap
54
+ pipeline_tag: sentence-similarity
55
+ tags:
56
+ - sentence-transformers
57
+ - sentence-similarity
58
+ - feature-extraction
59
+ - generated_from_trainer
60
+ - dataset_size:225247
61
+ - loss:CachedGISTEmbedLoss
62
+ widget:
63
+ - source_sentence: how long to grill boneless skinless chicken breasts in oven
64
+ sentences:
65
+ - "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name.\
66
+ \ Its pronunciation is AA K AA HHiy â\x80 . Akahi's origin, as well as its use,\
67
+ \ is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently\
68
+ \ used as a baby name for boys."
69
+ - October consists of 31 days. November has 30 days. When you add both together
70
+ they have 61 days.
71
+ - Heat a grill or grill pan. When the grill is hot, place the chicken on the grill
72
+ and cook for about 4 minutes per side, or until cooked through. You can also bake
73
+ the thawed chicken in a 375 degree F oven for 15 minutes, or until cooked through.
74
+ - source_sentence: More than 273 people have died from the 2019-20 coronavirus outside
75
+ mainland China .
76
+ sentences:
77
+ - 'More than 3,700 people have died : around 3,100 in mainland China and around
78
+ 550 in all other countries combined .'
79
+ - 'More than 3,200 people have died : almost 3,000 in mainland China and around
80
+ 275 in other countries .'
81
+ - more than 4,900 deaths have been attributed to COVID-19 .
82
+ - source_sentence: Most red algae species live in oceans.
83
+ sentences:
84
+ - Where do most red algae species live?
85
+ - Which layer of the earth is molten?
86
+ - As a diver descends, the increase in pressure causes the body’s air pockets in
87
+ the ears and lungs to do what?
88
+ - source_sentence: Binary compounds of carbon with less electronegative elements are
89
+ called carbides.
90
+ sentences:
91
+ - What are four children born at one birth called?
92
+ - Binary compounds of carbon with less electronegative elements are called what?
93
+ - The water cycle involves movement of water between air and what?
94
+ - source_sentence: What is the basic monetary unit of Iceland?
95
+ sentences:
96
+ - 'Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese traditional
97
+ dress Want to watch this again later? Sign in to add this video to a playlist.
98
+ Need to report the video? Sign in to report inappropriate content. Rating is available
99
+ when the video has been rented. This feature is not available right now. Please
100
+ try again later. Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant,
101
+ áo dài was designed to praise the slender beauty of Vietnamese women. The dress
102
+ is a genius combination of ancient and modern. It shows every curve on the girl''s
103
+ body, creating sexiness for the wearer, yet it still preserves the traditional
104
+ feminine grace of Vietnamese women with its charming flowing flaps. The simplicity
105
+ of áo dài makes it convenient and practical, something that other Asian traditional
106
+ clothes lack. The waist-length slits of the flaps allow every movement of the
107
+ legs: walking, running, riding a bicycle, climbing a tree, doing high kicks. The
108
+ looseness of the pants allows comfortability. As a girl walks in áo dài, the movements
109
+ of the flaps make it seem like she''s not walking but floating in the air. This
110
+ breath-taking beautiful image of a Vietnamese girl walking in áo dài has been
111
+ an inspiration for generations of Vietnamese poets, novelists, artists and has
112
+ left a deep impression for every foreigner who has visited the country. Category'
113
+ - 'Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary
114
+ Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary
115
+ http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic
116
+ monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend:
117
+ monetary unit - a unit of money Icelandic krona , krona - the basic unit of money
118
+ in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its
119
+ existence? Tell a friend about us , add a link to this page, or visit the webmaster''s
120
+ page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
121
+ Disclaimer All content on this website, including dictionary, thesaurus, literature,
122
+ geography, and other reference data is for informational purposes only. This information
123
+ should not be considered complete, up to date, and is not intended to be used
124
+ in place of a visit, consultation, or advice of a legal, medical, or any other
125
+ professional.'
126
+ - 'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll
127
+ A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and
128
+ algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:'
129
+ model-index:
130
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
131
+ results:
132
+ - task:
133
+ type: semantic-similarity
134
+ name: Semantic Similarity
135
+ dataset:
136
+ name: sts test
137
+ type: sts-test
138
+ metrics:
139
+ - type: pearson_cosine
140
+ value: 0.22248205020578934
141
+ name: Pearson Cosine
142
+ - type: spearman_cosine
143
+ value: 0.24802235964390085
144
+ name: Spearman Cosine
145
+ - type: pearson_manhattan
146
+ value: 0.26632593273308647
147
+ name: Pearson Manhattan
148
+ - type: spearman_manhattan
149
+ value: 0.2843623073856928
150
+ name: Spearman Manhattan
151
+ - type: pearson_euclidean
152
+ value: 0.2323160413842197
153
+ name: Pearson Euclidean
154
+ - type: spearman_euclidean
155
+ value: 0.24799036249272113
156
+ name: Spearman Euclidean
157
+ - type: pearson_dot
158
+ value: 0.22239084967931927
159
+ name: Pearson Dot
160
+ - type: spearman_dot
161
+ value: 0.24791612015173234
162
+ name: Spearman Dot
163
+ - type: pearson_max
164
+ value: 0.26632593273308647
165
+ name: Pearson Max
166
+ - type: spearman_max
167
+ value: 0.2843623073856928
168
+ name: Spearman Max
169
+ - task:
170
+ type: binary-classification
171
+ name: Binary Classification
172
+ dataset:
173
+ name: allNLI dev
174
+ type: allNLI-dev
175
+ metrics:
176
+ - type: cosine_accuracy
177
+ value: 0.666015625
178
+ name: Cosine Accuracy
179
+ - type: cosine_accuracy_threshold
180
+ value: 0.983686089515686
181
+ name: Cosine Accuracy Threshold
182
+ - type: cosine_f1
183
+ value: 0.5065885797950219
184
+ name: Cosine F1
185
+ - type: cosine_f1_threshold
186
+ value: 0.7642872333526611
187
+ name: Cosine F1 Threshold
188
+ - type: cosine_precision
189
+ value: 0.3392156862745098
190
+ name: Cosine Precision
191
+ - type: cosine_recall
192
+ value: 1.0
193
+ name: Cosine Recall
194
+ - type: cosine_ap
195
+ value: 0.34411819659341086
196
+ name: Cosine Ap
197
+ - type: dot_accuracy
198
+ value: 0.666015625
199
+ name: Dot Accuracy
200
+ - type: dot_accuracy_threshold
201
+ value: 755.60302734375
202
+ name: Dot Accuracy Threshold
203
+ - type: dot_f1
204
+ value: 0.5065885797950219
205
+ name: Dot F1
206
+ - type: dot_f1_threshold
207
+ value: 587.0625
208
+ name: Dot F1 Threshold
209
+ - type: dot_precision
210
+ value: 0.3392156862745098
211
+ name: Dot Precision
212
+ - type: dot_recall
213
+ value: 1.0
214
+ name: Dot Recall
215
+ - type: dot_ap
216
+ value: 0.344109544232086
217
+ name: Dot Ap
218
+ - type: manhattan_accuracy
219
+ value: 0.6640625
220
+ name: Manhattan Accuracy
221
+ - type: manhattan_accuracy_threshold
222
+ value: 62.69102096557617
223
+ name: Manhattan Accuracy Threshold
224
+ - type: manhattan_f1
225
+ value: 0.5058479532163743
226
+ name: Manhattan F1
227
+ - type: manhattan_f1_threshold
228
+ value: 337.6861877441406
229
+ name: Manhattan F1 Threshold
230
+ - type: manhattan_precision
231
+ value: 0.3385518590998043
232
+ name: Manhattan Precision
233
+ - type: manhattan_recall
234
+ value: 1.0
235
+ name: Manhattan Recall
236
+ - type: manhattan_ap
237
+ value: 0.35131239981425566
238
+ name: Manhattan Ap
239
+ - type: euclidean_accuracy
240
+ value: 0.666015625
241
+ name: Euclidean Accuracy
242
+ - type: euclidean_accuracy_threshold
243
+ value: 5.00581693649292
244
+ name: Euclidean Accuracy Threshold
245
+ - type: euclidean_f1
246
+ value: 0.5065885797950219
247
+ name: Euclidean F1
248
+ - type: euclidean_f1_threshold
249
+ value: 19.022436141967773
250
+ name: Euclidean F1 Threshold
251
+ - type: euclidean_precision
252
+ value: 0.3392156862745098
253
+ name: Euclidean Precision
254
+ - type: euclidean_recall
255
+ value: 1.0
256
+ name: Euclidean Recall
257
+ - type: euclidean_ap
258
+ value: 0.3441246898925644
259
+ name: Euclidean Ap
260
+ - type: max_accuracy
261
+ value: 0.666015625
262
+ name: Max Accuracy
263
+ - type: max_accuracy_threshold
264
+ value: 755.60302734375
265
+ name: Max Accuracy Threshold
266
+ - type: max_f1
267
+ value: 0.5065885797950219
268
+ name: Max F1
269
+ - type: max_f1_threshold
270
+ value: 587.0625
271
+ name: Max F1 Threshold
272
+ - type: max_precision
273
+ value: 0.3392156862745098
274
+ name: Max Precision
275
+ - type: max_recall
276
+ value: 1.0
277
+ name: Max Recall
278
+ - type: max_ap
279
+ value: 0.35131239981425566
280
+ name: Max Ap
281
+ - task:
282
+ type: binary-classification
283
+ name: Binary Classification
284
+ dataset:
285
+ name: Qnli dev
286
+ type: Qnli-dev
287
+ metrics:
288
+ - type: cosine_accuracy
289
+ value: 0.591796875
290
+ name: Cosine Accuracy
291
+ - type: cosine_accuracy_threshold
292
+ value: 0.9258557558059692
293
+ name: Cosine Accuracy Threshold
294
+ - type: cosine_f1
295
+ value: 0.6291834002677376
296
+ name: Cosine F1
297
+ - type: cosine_f1_threshold
298
+ value: 0.750666618347168
299
+ name: Cosine F1 Threshold
300
+ - type: cosine_precision
301
+ value: 0.4598825831702544
302
+ name: Cosine Precision
303
+ - type: cosine_recall
304
+ value: 0.9957627118644068
305
+ name: Cosine Recall
306
+ - type: cosine_ap
307
+ value: 0.5585355274462735
308
+ name: Cosine Ap
309
+ - type: dot_accuracy
310
+ value: 0.591796875
311
+ name: Dot Accuracy
312
+ - type: dot_accuracy_threshold
313
+ value: 711.18359375
314
+ name: Dot Accuracy Threshold
315
+ - type: dot_f1
316
+ value: 0.6291834002677376
317
+ name: Dot F1
318
+ - type: dot_f1_threshold
319
+ value: 576.5970458984375
320
+ name: Dot F1 Threshold
321
+ - type: dot_precision
322
+ value: 0.4598825831702544
323
+ name: Dot Precision
324
+ - type: dot_recall
325
+ value: 0.9957627118644068
326
+ name: Dot Recall
327
+ - type: dot_ap
328
+ value: 0.5585297234749824
329
+ name: Dot Ap
330
+ - type: manhattan_accuracy
331
+ value: 0.619140625
332
+ name: Manhattan Accuracy
333
+ - type: manhattan_accuracy_threshold
334
+ value: 188.09068298339844
335
+ name: Manhattan Accuracy Threshold
336
+ - type: manhattan_f1
337
+ value: 0.6301775147928994
338
+ name: Manhattan F1
339
+ - type: manhattan_f1_threshold
340
+ value: 237.80462646484375
341
+ name: Manhattan F1 Threshold
342
+ - type: manhattan_precision
343
+ value: 0.48409090909090907
344
+ name: Manhattan Precision
345
+ - type: manhattan_recall
346
+ value: 0.902542372881356
347
+ name: Manhattan Recall
348
+ - type: manhattan_ap
349
+ value: 0.5898283705050701
350
+ name: Manhattan Ap
351
+ - type: euclidean_accuracy
352
+ value: 0.591796875
353
+ name: Euclidean Accuracy
354
+ - type: euclidean_accuracy_threshold
355
+ value: 10.672666549682617
356
+ name: Euclidean Accuracy Threshold
357
+ - type: euclidean_f1
358
+ value: 0.6291834002677376
359
+ name: Euclidean F1
360
+ - type: euclidean_f1_threshold
361
+ value: 19.553747177124023
362
+ name: Euclidean F1 Threshold
363
+ - type: euclidean_precision
364
+ value: 0.4598825831702544
365
+ name: Euclidean Precision
366
+ - type: euclidean_recall
367
+ value: 0.9957627118644068
368
+ name: Euclidean Recall
369
+ - type: euclidean_ap
370
+ value: 0.5585355274462735
371
+ name: Euclidean Ap
372
+ - type: max_accuracy
373
+ value: 0.619140625
374
+ name: Max Accuracy
375
+ - type: max_accuracy_threshold
376
+ value: 711.18359375
377
+ name: Max Accuracy Threshold
378
+ - type: max_f1
379
+ value: 0.6301775147928994
380
+ name: Max F1
381
+ - type: max_f1_threshold
382
+ value: 576.5970458984375
383
+ name: Max F1 Threshold
384
+ - type: max_precision
385
+ value: 0.48409090909090907
386
+ name: Max Precision
387
+ - type: max_recall
388
+ value: 0.9957627118644068
389
+ name: Max Recall
390
+ - type: max_ap
391
+ value: 0.5898283705050701
392
+ name: Max Ap
393
+ ---
394
+
395
+ # SentenceTransformer based on microsoft/deberta-v3-small
396
+
397
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
398
+
399
+ ## Model Details
400
+
401
+ ### Model Description
402
+ - **Model Type:** Sentence Transformer
403
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
404
+ - **Maximum Sequence Length:** 512 tokens
405
+ - **Output Dimensionality:** 768 tokens
406
+ - **Similarity Function:** Cosine Similarity
407
+ <!-- - **Training Dataset:** Unknown -->
408
+ - **Language:** en
409
+ <!-- - **License:** Unknown -->
410
+
411
+ ### Model Sources
412
+
413
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
414
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
415
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
416
+
417
+ ### Full Model Architecture
418
+
419
+ ```
420
+ SentenceTransformer(
421
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
422
+ (1): AdvancedWeightedPooling(
423
+ (linear_cls): Linear(in_features=768, out_features=768, bias=True)
424
+ (mha): MultiheadAttention(
425
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
426
+ )
427
+ (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
428
+ (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
429
+ )
430
+ )
431
+ ```
432
+
433
+ ## Usage
434
+
435
+ ### Direct Usage (Sentence Transformers)
436
+
437
+ First install the Sentence Transformers library:
438
+
439
+ ```bash
440
+ pip install -U sentence-transformers
441
+ ```
442
+
443
+ Then you can load this model and run inference.
444
+ ```python
445
+ from sentence_transformers import SentenceTransformer
446
+
447
+ # Download from the 🤗 Hub
448
+ model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
449
+ # Run inference
450
+ sentences = [
451
+ 'What is the basic monetary unit of Iceland?',
452
+ "Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
453
+ 'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
454
+ ]
455
+ embeddings = model.encode(sentences)
456
+ print(embeddings.shape)
457
+ # [3, 768]
458
+
459
+ # Get the similarity scores for the embeddings
460
+ similarities = model.similarity(embeddings, embeddings)
461
+ print(similarities.shape)
462
+ # [3, 3]
463
+ ```
464
+
465
+ <!--
466
+ ### Direct Usage (Transformers)
467
+
468
+ <details><summary>Click to see the direct usage in Transformers</summary>
469
+
470
+ </details>
471
+ -->
472
+
473
+ <!--
474
+ ### Downstream Usage (Sentence Transformers)
475
+
476
+ You can finetune this model on your own dataset.
477
+
478
+ <details><summary>Click to expand</summary>
479
+
480
+ </details>
481
+ -->
482
+
483
+ <!--
484
+ ### Out-of-Scope Use
485
+
486
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
487
+ -->
488
+
489
+ ## Evaluation
490
+
491
+ ### Metrics
492
+
493
+ #### Semantic Similarity
494
+ * Dataset: `sts-test`
495
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
496
+
497
+ | Metric | Value |
498
+ |:--------------------|:----------|
499
+ | pearson_cosine | 0.2225 |
500
+ | **spearman_cosine** | **0.248** |
501
+ | pearson_manhattan | 0.2663 |
502
+ | spearman_manhattan | 0.2844 |
503
+ | pearson_euclidean | 0.2323 |
504
+ | spearman_euclidean | 0.248 |
505
+ | pearson_dot | 0.2224 |
506
+ | spearman_dot | 0.2479 |
507
+ | pearson_max | 0.2663 |
508
+ | spearman_max | 0.2844 |
509
+
510
+ #### Binary Classification
511
+ * Dataset: `allNLI-dev`
512
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
513
+
514
+ | Metric | Value |
515
+ |:-----------------------------|:-----------|
516
+ | cosine_accuracy | 0.666 |
517
+ | cosine_accuracy_threshold | 0.9837 |
518
+ | cosine_f1 | 0.5066 |
519
+ | cosine_f1_threshold | 0.7643 |
520
+ | cosine_precision | 0.3392 |
521
+ | cosine_recall | 1.0 |
522
+ | cosine_ap | 0.3441 |
523
+ | dot_accuracy | 0.666 |
524
+ | dot_accuracy_threshold | 755.603 |
525
+ | dot_f1 | 0.5066 |
526
+ | dot_f1_threshold | 587.0625 |
527
+ | dot_precision | 0.3392 |
528
+ | dot_recall | 1.0 |
529
+ | dot_ap | 0.3441 |
530
+ | manhattan_accuracy | 0.6641 |
531
+ | manhattan_accuracy_threshold | 62.691 |
532
+ | manhattan_f1 | 0.5058 |
533
+ | manhattan_f1_threshold | 337.6862 |
534
+ | manhattan_precision | 0.3386 |
535
+ | manhattan_recall | 1.0 |
536
+ | manhattan_ap | 0.3513 |
537
+ | euclidean_accuracy | 0.666 |
538
+ | euclidean_accuracy_threshold | 5.0058 |
539
+ | euclidean_f1 | 0.5066 |
540
+ | euclidean_f1_threshold | 19.0224 |
541
+ | euclidean_precision | 0.3392 |
542
+ | euclidean_recall | 1.0 |
543
+ | euclidean_ap | 0.3441 |
544
+ | max_accuracy | 0.666 |
545
+ | max_accuracy_threshold | 755.603 |
546
+ | max_f1 | 0.5066 |
547
+ | max_f1_threshold | 587.0625 |
548
+ | max_precision | 0.3392 |
549
+ | max_recall | 1.0 |
550
+ | **max_ap** | **0.3513** |
551
+
552
+ #### Binary Classification
553
+ * Dataset: `Qnli-dev`
554
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
555
+
556
+ | Metric | Value |
557
+ |:-----------------------------|:-----------|
558
+ | cosine_accuracy | 0.5918 |
559
+ | cosine_accuracy_threshold | 0.9259 |
560
+ | cosine_f1 | 0.6292 |
561
+ | cosine_f1_threshold | 0.7507 |
562
+ | cosine_precision | 0.4599 |
563
+ | cosine_recall | 0.9958 |
564
+ | cosine_ap | 0.5585 |
565
+ | dot_accuracy | 0.5918 |
566
+ | dot_accuracy_threshold | 711.1836 |
567
+ | dot_f1 | 0.6292 |
568
+ | dot_f1_threshold | 576.597 |
569
+ | dot_precision | 0.4599 |
570
+ | dot_recall | 0.9958 |
571
+ | dot_ap | 0.5585 |
572
+ | manhattan_accuracy | 0.6191 |
573
+ | manhattan_accuracy_threshold | 188.0907 |
574
+ | manhattan_f1 | 0.6302 |
575
+ | manhattan_f1_threshold | 237.8046 |
576
+ | manhattan_precision | 0.4841 |
577
+ | manhattan_recall | 0.9025 |
578
+ | manhattan_ap | 0.5898 |
579
+ | euclidean_accuracy | 0.5918 |
580
+ | euclidean_accuracy_threshold | 10.6727 |
581
+ | euclidean_f1 | 0.6292 |
582
+ | euclidean_f1_threshold | 19.5537 |
583
+ | euclidean_precision | 0.4599 |
584
+ | euclidean_recall | 0.9958 |
585
+ | euclidean_ap | 0.5585 |
586
+ | max_accuracy | 0.6191 |
587
+ | max_accuracy_threshold | 711.1836 |
588
+ | max_f1 | 0.6302 |
589
+ | max_f1_threshold | 576.597 |
590
+ | max_precision | 0.4841 |
591
+ | max_recall | 0.9958 |
592
+ | **max_ap** | **0.5898** |
593
+
594
+ <!--
595
+ ## Bias, Risks and Limitations
596
+
597
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
598
+ -->
599
+
600
+ <!--
601
+ ### Recommendations
602
+
603
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
604
+ -->
605
+
606
+ ## Training Details
607
+
608
+ ### Evaluation Dataset
609
+
610
+ #### vitaminc-pairs
611
+
612
+ * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
613
+ * Size: 128 evaluation samples
614
+ * Columns: <code>claim</code> and <code>evidence</code>
615
+ * Approximate statistics based on the first 128 samples:
616
+ | | claim | evidence |
617
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
618
+ | type | string | string |
619
+ | details | <ul><li>min: 9 tokens</li><li>mean: 21.42 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 35.55 tokens</li><li>max: 79 tokens</li></ul> |
620
+ * Samples:
621
+ | claim | evidence |
622
+ |:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
623
+ | <code>Dragon Con had over 5000 guests .</code> | <code>Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .</code> |
624
+ | <code>COVID-19 has reached more than 185 countries .</code> | <code>As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .</code> |
625
+ | <code>In March , Italy had 3.6x times more cases of coronavirus than China .</code> | <code>As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .</code> |
626
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
627
+ ```json
628
+ {'guide': SentenceTransformer(
629
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
630
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
631
+ (2): Normalize()
632
+ ), 'temperature': 0.025}
633
+ ```
634
+
635
+ ### Training Hyperparameters
636
+ #### Non-Default Hyperparameters
637
+
638
+ - `eval_strategy`: steps
639
+ - `per_device_train_batch_size`: 42
640
+ - `per_device_eval_batch_size`: 128
641
+ - `gradient_accumulation_steps`: 2
642
+ - `learning_rate`: 3e-05
643
+ - `weight_decay`: 0.001
644
+ - `lr_scheduler_type`: cosine_with_min_lr
645
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1e-05}
646
+ - `warmup_ratio`: 0.25
647
+ - `save_safetensors`: False
648
+ - `fp16`: True
649
+ - `push_to_hub`: True
650
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
651
+ - `hub_strategy`: all_checkpoints
652
+ - `batch_sampler`: no_duplicates
653
+
654
+ #### All Hyperparameters
655
+ <details><summary>Click to expand</summary>
656
+
657
+ - `overwrite_output_dir`: False
658
+ - `do_predict`: False
659
+ - `eval_strategy`: steps
660
+ - `prediction_loss_only`: True
661
+ - `per_device_train_batch_size`: 42
662
+ - `per_device_eval_batch_size`: 128
663
+ - `per_gpu_train_batch_size`: None
664
+ - `per_gpu_eval_batch_size`: None
665
+ - `gradient_accumulation_steps`: 2
666
+ - `eval_accumulation_steps`: None
667
+ - `torch_empty_cache_steps`: None
668
+ - `learning_rate`: 3e-05
669
+ - `weight_decay`: 0.001
670
+ - `adam_beta1`: 0.9
671
+ - `adam_beta2`: 0.999
672
+ - `adam_epsilon`: 1e-08
673
+ - `max_grad_norm`: 1.0
674
+ - `num_train_epochs`: 3
675
+ - `max_steps`: -1
676
+ - `lr_scheduler_type`: cosine_with_min_lr
677
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1e-05}
678
+ - `warmup_ratio`: 0.25
679
+ - `warmup_steps`: 0
680
+ - `log_level`: passive
681
+ - `log_level_replica`: warning
682
+ - `log_on_each_node`: True
683
+ - `logging_nan_inf_filter`: True
684
+ - `save_safetensors`: False
685
+ - `save_on_each_node`: False
686
+ - `save_only_model`: False
687
+ - `restore_callback_states_from_checkpoint`: False
688
+ - `no_cuda`: False
689
+ - `use_cpu`: False
690
+ - `use_mps_device`: False
691
+ - `seed`: 42
692
+ - `data_seed`: None
693
+ - `jit_mode_eval`: False
694
+ - `use_ipex`: False
695
+ - `bf16`: False
696
+ - `fp16`: True
697
+ - `fp16_opt_level`: O1
698
+ - `half_precision_backend`: auto
699
+ - `bf16_full_eval`: False
700
+ - `fp16_full_eval`: False
701
+ - `tf32`: None
702
+ - `local_rank`: 0
703
+ - `ddp_backend`: None
704
+ - `tpu_num_cores`: None
705
+ - `tpu_metrics_debug`: False
706
+ - `debug`: []
707
+ - `dataloader_drop_last`: False
708
+ - `dataloader_num_workers`: 0
709
+ - `dataloader_prefetch_factor`: None
710
+ - `past_index`: -1
711
+ - `disable_tqdm`: False
712
+ - `remove_unused_columns`: True
713
+ - `label_names`: None
714
+ - `load_best_model_at_end`: False
715
+ - `ignore_data_skip`: False
716
+ - `fsdp`: []
717
+ - `fsdp_min_num_params`: 0
718
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
719
+ - `fsdp_transformer_layer_cls_to_wrap`: None
720
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
721
+ - `deepspeed`: None
722
+ - `label_smoothing_factor`: 0.0
723
+ - `optim`: adamw_torch
724
+ - `optim_args`: None
725
+ - `adafactor`: False
726
+ - `group_by_length`: False
727
+ - `length_column_name`: length
728
+ - `ddp_find_unused_parameters`: None
729
+ - `ddp_bucket_cap_mb`: None
730
+ - `ddp_broadcast_buffers`: False
731
+ - `dataloader_pin_memory`: True
732
+ - `dataloader_persistent_workers`: False
733
+ - `skip_memory_metrics`: True
734
+ - `use_legacy_prediction_loop`: False
735
+ - `push_to_hub`: True
736
+ - `resume_from_checkpoint`: None
737
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
738
+ - `hub_strategy`: all_checkpoints
739
+ - `hub_private_repo`: False
740
+ - `hub_always_push`: False
741
+ - `gradient_checkpointing`: False
742
+ - `gradient_checkpointing_kwargs`: None
743
+ - `include_inputs_for_metrics`: False
744
+ - `eval_do_concat_batches`: True
745
+ - `fp16_backend`: auto
746
+ - `push_to_hub_model_id`: None
747
+ - `push_to_hub_organization`: None
748
+ - `mp_parameters`:
749
+ - `auto_find_batch_size`: False
750
+ - `full_determinism`: False
751
+ - `torchdynamo`: None
752
+ - `ray_scope`: last
753
+ - `ddp_timeout`: 1800
754
+ - `torch_compile`: False
755
+ - `torch_compile_backend`: None
756
+ - `torch_compile_mode`: None
757
+ - `dispatch_batches`: None
758
+ - `split_batches`: None
759
+ - `include_tokens_per_second`: False
760
+ - `include_num_input_tokens_seen`: False
761
+ - `neftune_noise_alpha`: None
762
+ - `optim_target_modules`: None
763
+ - `batch_eval_metrics`: False
764
+ - `eval_on_start`: False
765
+ - `use_liger_kernel`: False
766
+ - `eval_use_gather_object`: False
767
+ - `batch_sampler`: no_duplicates
768
+ - `multi_dataset_batch_sampler`: proportional
769
+
770
+ </details>
771
+
772
+ ### Training Logs
773
+ | Epoch | Step | Training Loss | vitaminc-pairs loss | negation-triplets loss | scitail-pairs-pos loss | scitail-pairs-qa loss | xsum-pairs loss | sciq pairs loss | qasc pairs loss | openbookqa pairs loss | msmarco pairs loss | nq pairs loss | trivia pairs loss | gooaq pairs loss | paws-pos loss | global dataset loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
774
+ |:------:|:----:|:-------------:|:-------------------:|:----------------------:|:----------------------:|:---------------------:|:---------------:|:---------------:|:---------------:|:---------------------:|:------------------:|:-------------:|:-----------------:|:----------------:|:-------------:|:-------------------:|:------------------------:|:-----------------:|:---------------:|
775
+ | 0.0009 | 1 | 5.8564 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
776
+ | 0.0018 | 2 | 7.1716 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
777
+ | 0.0027 | 3 | 5.9095 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
778
+ | 0.0035 | 4 | 5.0841 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
779
+ | 0.0044 | 5 | 4.0184 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
780
+ | 0.0053 | 6 | 6.2191 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
781
+ | 0.0062 | 7 | 5.6124 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
782
+ | 0.0071 | 8 | 3.9544 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
783
+ | 0.0080 | 9 | 4.7149 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
784
+ | 0.0088 | 10 | 4.9616 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
785
+ | 0.0097 | 11 | 5.2794 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
786
+ | 0.0106 | 12 | 8.8704 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
787
+ | 0.0115 | 13 | 6.0707 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
788
+ | 0.0124 | 14 | 5.4071 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
789
+ | 0.0133 | 15 | 6.9104 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
790
+ | 0.0142 | 16 | 6.0276 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
791
+ | 0.0150 | 17 | 6.737 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
792
+ | 0.0159 | 18 | 6.5354 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
793
+ | 0.0168 | 19 | 5.206 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
794
+ | 0.0177 | 20 | 5.2469 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
795
+ | 0.0186 | 21 | 5.3771 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
796
+ | 0.0195 | 22 | 4.979 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
797
+ | 0.0204 | 23 | 4.7909 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
798
+ | 0.0212 | 24 | 4.9086 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
799
+ | 0.0221 | 25 | 4.8826 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
800
+ | 0.0230 | 26 | 8.2266 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
801
+ | 0.0239 | 27 | 8.3024 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
802
+ | 0.0248 | 28 | 5.8745 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
803
+ | 0.0257 | 29 | 4.7298 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
804
+ | 0.0265 | 30 | 5.4614 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
805
+ | 0.0274 | 31 | 5.8594 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
806
+ | 0.0283 | 32 | 5.2401 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
807
+ | 0.0292 | 33 | 5.1579 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
808
+ | 0.0301 | 34 | 5.2181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
809
+ | 0.0310 | 35 | 4.6328 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
810
+ | 0.0319 | 36 | 2.121 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
811
+ | 0.0327 | 37 | 5.9026 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
812
+ | 0.0336 | 38 | 7.3796 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
813
+ | 0.0345 | 39 | 5.5361 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
814
+ | 0.0354 | 40 | 4.0243 | 2.9018 | 5.6903 | 2.1136 | 2.8052 | 6.5831 | 0.8882 | 4.1148 | 5.0966 | 10.3911 | 10.9032 | 7.1904 | 8.1935 | 1.3943 | 5.6716 | 0.1879 | 0.3385 | 0.5781 |
815
+ | 0.0363 | 41 | 4.9072 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
816
+ | 0.0372 | 42 | 3.4439 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
817
+ | 0.0381 | 43 | 4.9787 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
818
+ | 0.0389 | 44 | 5.8318 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
819
+ | 0.0398 | 45 | 5.3226 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
820
+ | 0.0407 | 46 | 5.1181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
821
+ | 0.0416 | 47 | 4.7834 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
822
+ | 0.0425 | 48 | 6.6303 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
823
+ | 0.0434 | 49 | 5.8171 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
824
+ | 0.0442 | 50 | 5.1962 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
825
+ | 0.0451 | 51 | 5.2096 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
826
+ | 0.0460 | 52 | 5.0943 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
827
+ | 0.0469 | 53 | 4.9038 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
828
+ | 0.0478 | 54 | 4.6479 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
829
+ | 0.0487 | 55 | 5.5098 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
830
+ | 0.0496 | 56 | 4.6979 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
831
+ | 0.0504 | 57 | 3.1969 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
832
+ | 0.0513 | 58 | 4.4127 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
833
+ | 0.0522 | 59 | 3.7746 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
834
+ | 0.0531 | 60 | 4.5378 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
835
+ | 0.0540 | 61 | 5.0209 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
836
+ | 0.0549 | 62 | 6.5936 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
837
+ | 0.0558 | 63 | 4.2315 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
838
+ | 0.0566 | 64 | 6.4269 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
839
+ | 0.0575 | 65 | 4.2644 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
840
+ | 0.0584 | 66 | 5.1388 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
841
+ | 0.0593 | 67 | 5.1852 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
842
+ | 0.0602 | 68 | 4.8057 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
843
+ | 0.0611 | 69 | 3.1725 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
844
+ | 0.0619 | 70 | 3.3322 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
845
+ | 0.0628 | 71 | 5.139 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
846
+ | 0.0637 | 72 | 4.307 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
847
+ | 0.0646 | 73 | 5.0133 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
848
+ | 0.0655 | 74 | 4.0507 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
849
+ | 0.0664 | 75 | 3.3895 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
850
+ | 0.0673 | 76 | 5.6736 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
851
+ | 0.0681 | 77 | 4.2572 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
852
+ | 0.0690 | 78 | 3.0796 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
853
+ | 0.0699 | 79 | 5.0199 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
854
+ | 0.0708 | 80 | 4.1414 | 2.7794 | 4.8890 | 1.8997 | 2.6761 | 6.2096 | 0.7622 | 3.3129 | 4.5498 | 7.2056 | 7.6809 | 6.3792 | 6.6567 | 1.3848 | 5.0030 | 0.2480 | 0.3513 | 0.5898 |
855
+
856
+
857
+ ### Framework Versions
858
+ - Python: 3.10.14
859
+ - Sentence Transformers: 3.2.0
860
+ - Transformers: 4.45.1
861
+ - PyTorch: 2.4.0
862
+ - Accelerate: 0.34.2
863
+ - Datasets: 3.0.1
864
+ - Tokenizers: 0.20.0
865
+
866
+ ## Citation
867
+
868
+ ### BibTeX
869
+
870
+ #### Sentence Transformers
871
+ ```bibtex
872
+ @inproceedings{reimers-2019-sentence-bert,
873
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
874
+ author = "Reimers, Nils and Gurevych, Iryna",
875
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
876
+ month = "11",
877
+ year = "2019",
878
+ publisher = "Association for Computational Linguistics",
879
+ url = "https://arxiv.org/abs/1908.10084",
880
+ }
881
+ ```
882
+
883
+ <!--
884
+ ## Glossary
885
+
886
+ *Clearly define terms in order to be accessible across audiences.*
887
+ -->
888
+
889
+ <!--
890
+ ## Model Card Authors
891
+
892
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
893
+ -->
894
+
895
+ <!--
896
+ ## Model Card Contact
897
+
898
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
899
+ -->
checkpoint-80/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-80/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.45.1",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-80/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.0",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-80/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_AdvancedWeightedPooling",
12
+ "type": "__main__.AdvancedWeightedPooling"
13
+ }
14
+ ]
checkpoint-80/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54cb91d89a9ceb4c5c238647ad187b3837a529d099e477303ac25dab6fcc341c
3
+ size 245742074
checkpoint-80/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02cd4ccb01d28af28e563810d015e0bfcbf2f3d9c978732487e0120e35cdb1f7
3
+ size 565251810
checkpoint-80/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18f9fac3c175cfc1292deb2d04e1fe210321ba3ff20498b0e9c71192ddfeb9c8
3
+ size 14244
checkpoint-80/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2cbfe905841f1188488869af5b7dc208420cd27f8d81a2ad1951627737978f4
3
+ size 1192
checkpoint-80/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-80/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
checkpoint-80/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-80/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-80/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }
checkpoint-80/trainer_state.json ADDED
@@ -0,0 +1,979 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.07079646017699115,
5
+ "eval_steps": 40,
6
+ "global_step": 80,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0008849557522123894,
13
+ "grad_norm": NaN,
14
+ "learning_rate": 0.0,
15
+ "loss": 5.8564,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.0017699115044247787,
20
+ "grad_norm": NaN,
21
+ "learning_rate": 0.0,
22
+ "loss": 7.1716,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.002654867256637168,
27
+ "grad_norm": NaN,
28
+ "learning_rate": 0.0,
29
+ "loss": 5.9095,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.0035398230088495575,
34
+ "grad_norm": 21.95326805114746,
35
+ "learning_rate": 3.5377358490566036e-09,
36
+ "loss": 5.0841,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.004424778761061947,
41
+ "grad_norm": 16.607179641723633,
42
+ "learning_rate": 7.075471698113207e-09,
43
+ "loss": 4.0184,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.005309734513274336,
48
+ "grad_norm": 33.789615631103516,
49
+ "learning_rate": 1.0613207547169811e-08,
50
+ "loss": 6.2191,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.006194690265486726,
55
+ "grad_norm": 28.073551177978516,
56
+ "learning_rate": 1.4150943396226414e-08,
57
+ "loss": 5.6124,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.007079646017699115,
62
+ "grad_norm": 17.365602493286133,
63
+ "learning_rate": 1.768867924528302e-08,
64
+ "loss": 3.9544,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.007964601769911504,
69
+ "grad_norm": 19.384475708007812,
70
+ "learning_rate": 2.1226415094339622e-08,
71
+ "loss": 4.7149,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.008849557522123894,
76
+ "grad_norm": 19.67770004272461,
77
+ "learning_rate": 2.4764150943396227e-08,
78
+ "loss": 4.9616,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.009734513274336283,
83
+ "grad_norm": 24.233421325683594,
84
+ "learning_rate": 2.830188679245283e-08,
85
+ "loss": 5.2794,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.010619469026548672,
90
+ "grad_norm": Infinity,
91
+ "learning_rate": 2.830188679245283e-08,
92
+ "loss": 8.8704,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.011504424778761062,
97
+ "grad_norm": 34.37785720825195,
98
+ "learning_rate": 3.183962264150943e-08,
99
+ "loss": 6.0707,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.012389380530973451,
104
+ "grad_norm": 25.11741065979004,
105
+ "learning_rate": 3.537735849056604e-08,
106
+ "loss": 5.4071,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.01327433628318584,
111
+ "grad_norm": 53.84364700317383,
112
+ "learning_rate": 3.891509433962264e-08,
113
+ "loss": 6.9104,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.01415929203539823,
118
+ "grad_norm": 32.0903434753418,
119
+ "learning_rate": 4.2452830188679244e-08,
120
+ "loss": 6.0276,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.01504424778761062,
125
+ "grad_norm": 39.742130279541016,
126
+ "learning_rate": 4.599056603773585e-08,
127
+ "loss": 6.737,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.01592920353982301,
132
+ "grad_norm": 45.267417907714844,
133
+ "learning_rate": 4.9528301886792454e-08,
134
+ "loss": 6.5354,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.016814159292035398,
139
+ "grad_norm": 22.39731788635254,
140
+ "learning_rate": 5.3066037735849055e-08,
141
+ "loss": 5.206,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.017699115044247787,
146
+ "grad_norm": 20.858232498168945,
147
+ "learning_rate": 5.660377358490566e-08,
148
+ "loss": 5.2469,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.018584070796460177,
153
+ "grad_norm": 23.96446990966797,
154
+ "learning_rate": 6.014150943396226e-08,
155
+ "loss": 5.3771,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.019469026548672566,
160
+ "grad_norm": 22.945741653442383,
161
+ "learning_rate": 6.367924528301887e-08,
162
+ "loss": 4.979,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.020353982300884955,
167
+ "grad_norm": 15.497300148010254,
168
+ "learning_rate": 6.721698113207547e-08,
169
+ "loss": 4.7909,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.021238938053097345,
174
+ "grad_norm": 20.039024353027344,
175
+ "learning_rate": 7.075471698113208e-08,
176
+ "loss": 4.9086,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.022123893805309734,
181
+ "grad_norm": 21.30576515197754,
182
+ "learning_rate": 7.429245283018869e-08,
183
+ "loss": 4.8826,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.023008849557522124,
188
+ "grad_norm": 64.5285873413086,
189
+ "learning_rate": 7.783018867924529e-08,
190
+ "loss": 8.2266,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.023893805309734513,
195
+ "grad_norm": 59.894893646240234,
196
+ "learning_rate": 8.13679245283019e-08,
197
+ "loss": 8.3024,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.024778761061946902,
202
+ "grad_norm": 25.504356384277344,
203
+ "learning_rate": 8.490566037735849e-08,
204
+ "loss": 5.8745,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.02566371681415929,
209
+ "grad_norm": 15.169568061828613,
210
+ "learning_rate": 8.84433962264151e-08,
211
+ "loss": 4.7298,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.02654867256637168,
216
+ "grad_norm": 24.09995460510254,
217
+ "learning_rate": 9.19811320754717e-08,
218
+ "loss": 5.4614,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.02743362831858407,
223
+ "grad_norm": 28.669275283813477,
224
+ "learning_rate": 9.55188679245283e-08,
225
+ "loss": 5.8594,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.02831858407079646,
230
+ "grad_norm": 23.37987518310547,
231
+ "learning_rate": 9.905660377358491e-08,
232
+ "loss": 5.2401,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.02920353982300885,
237
+ "grad_norm": 22.815292358398438,
238
+ "learning_rate": 1.0259433962264152e-07,
239
+ "loss": 5.1579,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.03008849557522124,
244
+ "grad_norm": 13.775344848632812,
245
+ "learning_rate": 1.0613207547169811e-07,
246
+ "loss": 5.2181,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.030973451327433628,
251
+ "grad_norm": 18.642087936401367,
252
+ "learning_rate": 1.0966981132075472e-07,
253
+ "loss": 4.6328,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.03185840707964602,
258
+ "grad_norm": 18.041406631469727,
259
+ "learning_rate": 1.1320754716981131e-07,
260
+ "loss": 2.121,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.03274336283185841,
265
+ "grad_norm": 23.423933029174805,
266
+ "learning_rate": 1.1674528301886792e-07,
267
+ "loss": 5.9026,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.033628318584070796,
272
+ "grad_norm": 46.25591278076172,
273
+ "learning_rate": 1.2028301886792452e-07,
274
+ "loss": 7.3796,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.034513274336283185,
279
+ "grad_norm": 20.376422882080078,
280
+ "learning_rate": 1.2382075471698114e-07,
281
+ "loss": 5.5361,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.035398230088495575,
286
+ "grad_norm": 12.82562255859375,
287
+ "learning_rate": 1.2735849056603773e-07,
288
+ "loss": 4.0243,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.035398230088495575,
293
+ "eval_Qnli-dev_cosine_accuracy": 0.5859375,
294
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9302856922149658,
295
+ "eval_Qnli-dev_cosine_ap": 0.5480269179285036,
296
+ "eval_Qnli-dev_cosine_f1": 0.6315789473684211,
297
+ "eval_Qnli-dev_cosine_f1_threshold": 0.7634451389312744,
298
+ "eval_Qnli-dev_cosine_precision": 0.4633663366336634,
299
+ "eval_Qnli-dev_cosine_recall": 0.9915254237288136,
300
+ "eval_Qnli-dev_dot_accuracy": 0.5859375,
301
+ "eval_Qnli-dev_dot_accuracy_threshold": 714.4895629882812,
302
+ "eval_Qnli-dev_dot_ap": 0.548060663242546,
303
+ "eval_Qnli-dev_dot_f1": 0.6315789473684211,
304
+ "eval_Qnli-dev_dot_f1_threshold": 586.342529296875,
305
+ "eval_Qnli-dev_dot_precision": 0.4633663366336634,
306
+ "eval_Qnli-dev_dot_recall": 0.9915254237288136,
307
+ "eval_Qnli-dev_euclidean_accuracy": 0.5859375,
308
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 10.348224639892578,
309
+ "eval_Qnli-dev_euclidean_ap": 0.5480269179285036,
310
+ "eval_Qnli-dev_euclidean_f1": 0.6315789473684211,
311
+ "eval_Qnli-dev_euclidean_f1_threshold": 19.05518341064453,
312
+ "eval_Qnli-dev_euclidean_precision": 0.4633663366336634,
313
+ "eval_Qnli-dev_euclidean_recall": 0.9915254237288136,
314
+ "eval_Qnli-dev_manhattan_accuracy": 0.59765625,
315
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 175.22628784179688,
316
+ "eval_Qnli-dev_manhattan_ap": 0.5780924813828909,
317
+ "eval_Qnli-dev_manhattan_f1": 0.6291834002677376,
318
+ "eval_Qnli-dev_manhattan_f1_threshold": 334.39178466796875,
319
+ "eval_Qnli-dev_manhattan_precision": 0.4598825831702544,
320
+ "eval_Qnli-dev_manhattan_recall": 0.9957627118644068,
321
+ "eval_Qnli-dev_max_accuracy": 0.59765625,
322
+ "eval_Qnli-dev_max_accuracy_threshold": 714.4895629882812,
323
+ "eval_Qnli-dev_max_ap": 0.5780924813828909,
324
+ "eval_Qnli-dev_max_f1": 0.6315789473684211,
325
+ "eval_Qnli-dev_max_f1_threshold": 586.342529296875,
326
+ "eval_Qnli-dev_max_precision": 0.4633663366336634,
327
+ "eval_Qnli-dev_max_recall": 0.9957627118644068,
328
+ "eval_allNLI-dev_cosine_accuracy": 0.6640625,
329
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9888672828674316,
330
+ "eval_allNLI-dev_cosine_ap": 0.32886365768247516,
331
+ "eval_allNLI-dev_cosine_f1": 0.5095729013254787,
332
+ "eval_allNLI-dev_cosine_f1_threshold": 0.7477295398712158,
333
+ "eval_allNLI-dev_cosine_precision": 0.34189723320158105,
334
+ "eval_allNLI-dev_cosine_recall": 1.0,
335
+ "eval_allNLI-dev_dot_accuracy": 0.6640625,
336
+ "eval_allNLI-dev_dot_accuracy_threshold": 759.483154296875,
337
+ "eval_allNLI-dev_dot_ap": 0.3288581611938815,
338
+ "eval_allNLI-dev_dot_f1": 0.5095729013254787,
339
+ "eval_allNLI-dev_dot_f1_threshold": 574.2760620117188,
340
+ "eval_allNLI-dev_dot_precision": 0.34189723320158105,
341
+ "eval_allNLI-dev_dot_recall": 1.0,
342
+ "eval_allNLI-dev_euclidean_accuracy": 0.6640625,
343
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 3.8085508346557617,
344
+ "eval_allNLI-dev_euclidean_ap": 0.32886365768247516,
345
+ "eval_allNLI-dev_euclidean_f1": 0.5095729013254787,
346
+ "eval_allNLI-dev_euclidean_f1_threshold": 19.684810638427734,
347
+ "eval_allNLI-dev_euclidean_precision": 0.34189723320158105,
348
+ "eval_allNLI-dev_euclidean_recall": 1.0,
349
+ "eval_allNLI-dev_manhattan_accuracy": 0.6640625,
350
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 65.93238830566406,
351
+ "eval_allNLI-dev_manhattan_ap": 0.33852594919898543,
352
+ "eval_allNLI-dev_manhattan_f1": 0.5058479532163743,
353
+ "eval_allNLI-dev_manhattan_f1_threshold": 335.4263916015625,
354
+ "eval_allNLI-dev_manhattan_precision": 0.3385518590998043,
355
+ "eval_allNLI-dev_manhattan_recall": 1.0,
356
+ "eval_allNLI-dev_max_accuracy": 0.6640625,
357
+ "eval_allNLI-dev_max_accuracy_threshold": 759.483154296875,
358
+ "eval_allNLI-dev_max_ap": 0.33852594919898543,
359
+ "eval_allNLI-dev_max_f1": 0.5095729013254787,
360
+ "eval_allNLI-dev_max_f1_threshold": 574.2760620117188,
361
+ "eval_allNLI-dev_max_precision": 0.34189723320158105,
362
+ "eval_allNLI-dev_max_recall": 1.0,
363
+ "eval_sequential_score": 0.5780924813828909,
364
+ "eval_sts-test_pearson_cosine": 0.1533465318414369,
365
+ "eval_sts-test_pearson_dot": 0.15333057450060855,
366
+ "eval_sts-test_pearson_euclidean": 0.1664717893342273,
367
+ "eval_sts-test_pearson_manhattan": 0.20717970064899288,
368
+ "eval_sts-test_pearson_max": 0.20717970064899288,
369
+ "eval_sts-test_spearman_cosine": 0.18786210334203038,
370
+ "eval_sts-test_spearman_dot": 0.1878347337472397,
371
+ "eval_sts-test_spearman_euclidean": 0.18786046572196458,
372
+ "eval_sts-test_spearman_manhattan": 0.22429466463153608,
373
+ "eval_sts-test_spearman_max": 0.22429466463153608,
374
+ "eval_vitaminc-pairs_loss": 2.901831865310669,
375
+ "eval_vitaminc-pairs_runtime": 4.078,
376
+ "eval_vitaminc-pairs_samples_per_second": 31.388,
377
+ "eval_vitaminc-pairs_steps_per_second": 0.245,
378
+ "step": 40
379
+ },
380
+ {
381
+ "epoch": 0.035398230088495575,
382
+ "eval_negation-triplets_loss": 5.690315246582031,
383
+ "eval_negation-triplets_runtime": 0.7141,
384
+ "eval_negation-triplets_samples_per_second": 179.254,
385
+ "eval_negation-triplets_steps_per_second": 1.4,
386
+ "step": 40
387
+ },
388
+ {
389
+ "epoch": 0.035398230088495575,
390
+ "eval_scitail-pairs-pos_loss": 2.1135852336883545,
391
+ "eval_scitail-pairs-pos_runtime": 0.8282,
392
+ "eval_scitail-pairs-pos_samples_per_second": 154.543,
393
+ "eval_scitail-pairs-pos_steps_per_second": 1.207,
394
+ "step": 40
395
+ },
396
+ {
397
+ "epoch": 0.035398230088495575,
398
+ "eval_scitail-pairs-qa_loss": 2.8052029609680176,
399
+ "eval_scitail-pairs-qa_runtime": 0.5471,
400
+ "eval_scitail-pairs-qa_samples_per_second": 233.943,
401
+ "eval_scitail-pairs-qa_steps_per_second": 1.828,
402
+ "step": 40
403
+ },
404
+ {
405
+ "epoch": 0.035398230088495575,
406
+ "eval_xsum-pairs_loss": 6.583061695098877,
407
+ "eval_xsum-pairs_runtime": 2.8921,
408
+ "eval_xsum-pairs_samples_per_second": 44.259,
409
+ "eval_xsum-pairs_steps_per_second": 0.346,
410
+ "step": 40
411
+ },
412
+ {
413
+ "epoch": 0.035398230088495575,
414
+ "eval_sciq_pairs_loss": 0.8882207870483398,
415
+ "eval_sciq_pairs_runtime": 3.7993,
416
+ "eval_sciq_pairs_samples_per_second": 33.69,
417
+ "eval_sciq_pairs_steps_per_second": 0.263,
418
+ "step": 40
419
+ },
420
+ {
421
+ "epoch": 0.035398230088495575,
422
+ "eval_qasc_pairs_loss": 4.1147541999816895,
423
+ "eval_qasc_pairs_runtime": 0.6768,
424
+ "eval_qasc_pairs_samples_per_second": 189.125,
425
+ "eval_qasc_pairs_steps_per_second": 1.478,
426
+ "step": 40
427
+ },
428
+ {
429
+ "epoch": 0.035398230088495575,
430
+ "eval_openbookqa_pairs_loss": 5.096628665924072,
431
+ "eval_openbookqa_pairs_runtime": 0.5776,
432
+ "eval_openbookqa_pairs_samples_per_second": 221.615,
433
+ "eval_openbookqa_pairs_steps_per_second": 1.731,
434
+ "step": 40
435
+ },
436
+ {
437
+ "epoch": 0.035398230088495575,
438
+ "eval_msmarco_pairs_loss": 10.391141891479492,
439
+ "eval_msmarco_pairs_runtime": 1.2577,
440
+ "eval_msmarco_pairs_samples_per_second": 101.77,
441
+ "eval_msmarco_pairs_steps_per_second": 0.795,
442
+ "step": 40
443
+ },
444
+ {
445
+ "epoch": 0.035398230088495575,
446
+ "eval_nq_pairs_loss": 10.903197288513184,
447
+ "eval_nq_pairs_runtime": 2.5051,
448
+ "eval_nq_pairs_samples_per_second": 51.095,
449
+ "eval_nq_pairs_steps_per_second": 0.399,
450
+ "step": 40
451
+ },
452
+ {
453
+ "epoch": 0.035398230088495575,
454
+ "eval_trivia_pairs_loss": 7.190384387969971,
455
+ "eval_trivia_pairs_runtime": 3.6482,
456
+ "eval_trivia_pairs_samples_per_second": 35.085,
457
+ "eval_trivia_pairs_steps_per_second": 0.274,
458
+ "step": 40
459
+ },
460
+ {
461
+ "epoch": 0.035398230088495575,
462
+ "eval_gooaq_pairs_loss": 8.193528175354004,
463
+ "eval_gooaq_pairs_runtime": 0.9648,
464
+ "eval_gooaq_pairs_samples_per_second": 132.67,
465
+ "eval_gooaq_pairs_steps_per_second": 1.036,
466
+ "step": 40
467
+ },
468
+ {
469
+ "epoch": 0.035398230088495575,
470
+ "eval_paws-pos_loss": 1.3942564725875854,
471
+ "eval_paws-pos_runtime": 0.6718,
472
+ "eval_paws-pos_samples_per_second": 190.538,
473
+ "eval_paws-pos_steps_per_second": 1.489,
474
+ "step": 40
475
+ },
476
+ {
477
+ "epoch": 0.035398230088495575,
478
+ "eval_global_dataset_loss": 5.671571731567383,
479
+ "eval_global_dataset_runtime": 23.0452,
480
+ "eval_global_dataset_samples_per_second": 28.77,
481
+ "eval_global_dataset_steps_per_second": 0.26,
482
+ "step": 40
483
+ },
484
+ {
485
+ "epoch": 0.036283185840707964,
486
+ "grad_norm": 18.026830673217773,
487
+ "learning_rate": 1.3089622641509433e-07,
488
+ "loss": 4.9072,
489
+ "step": 41
490
+ },
491
+ {
492
+ "epoch": 0.03716814159292035,
493
+ "grad_norm": 15.423810958862305,
494
+ "learning_rate": 1.3443396226415095e-07,
495
+ "loss": 3.4439,
496
+ "step": 42
497
+ },
498
+ {
499
+ "epoch": 0.03805309734513274,
500
+ "grad_norm": 16.31403160095215,
501
+ "learning_rate": 1.3797169811320754e-07,
502
+ "loss": 4.9787,
503
+ "step": 43
504
+ },
505
+ {
506
+ "epoch": 0.03893805309734513,
507
+ "grad_norm": 21.37955093383789,
508
+ "learning_rate": 1.4150943396226417e-07,
509
+ "loss": 5.8318,
510
+ "step": 44
511
+ },
512
+ {
513
+ "epoch": 0.03982300884955752,
514
+ "grad_norm": 18.23583984375,
515
+ "learning_rate": 1.4504716981132076e-07,
516
+ "loss": 5.3226,
517
+ "step": 45
518
+ },
519
+ {
520
+ "epoch": 0.04070796460176991,
521
+ "grad_norm": 20.878713607788086,
522
+ "learning_rate": 1.4858490566037738e-07,
523
+ "loss": 5.1181,
524
+ "step": 46
525
+ },
526
+ {
527
+ "epoch": 0.0415929203539823,
528
+ "grad_norm": 18.71149444580078,
529
+ "learning_rate": 1.5212264150943398e-07,
530
+ "loss": 4.7834,
531
+ "step": 47
532
+ },
533
+ {
534
+ "epoch": 0.04247787610619469,
535
+ "grad_norm": 38.85902786254883,
536
+ "learning_rate": 1.5566037735849057e-07,
537
+ "loss": 6.6303,
538
+ "step": 48
539
+ },
540
+ {
541
+ "epoch": 0.04336283185840708,
542
+ "grad_norm": 37.41562271118164,
543
+ "learning_rate": 1.591981132075472e-07,
544
+ "loss": 5.8171,
545
+ "step": 49
546
+ },
547
+ {
548
+ "epoch": 0.04424778761061947,
549
+ "grad_norm": 17.541080474853516,
550
+ "learning_rate": 1.627358490566038e-07,
551
+ "loss": 5.1962,
552
+ "step": 50
553
+ },
554
+ {
555
+ "epoch": 0.04513274336283186,
556
+ "grad_norm": 16.145116806030273,
557
+ "learning_rate": 1.6627358490566038e-07,
558
+ "loss": 5.2096,
559
+ "step": 51
560
+ },
561
+ {
562
+ "epoch": 0.04601769911504425,
563
+ "grad_norm": 20.175189971923828,
564
+ "learning_rate": 1.6981132075471698e-07,
565
+ "loss": 5.0943,
566
+ "step": 52
567
+ },
568
+ {
569
+ "epoch": 0.046902654867256637,
570
+ "grad_norm": 13.441214561462402,
571
+ "learning_rate": 1.733490566037736e-07,
572
+ "loss": 4.9038,
573
+ "step": 53
574
+ },
575
+ {
576
+ "epoch": 0.047787610619469026,
577
+ "grad_norm": 13.396607398986816,
578
+ "learning_rate": 1.768867924528302e-07,
579
+ "loss": 4.6479,
580
+ "step": 54
581
+ },
582
+ {
583
+ "epoch": 0.048672566371681415,
584
+ "grad_norm": 13.68046760559082,
585
+ "learning_rate": 1.804245283018868e-07,
586
+ "loss": 5.5098,
587
+ "step": 55
588
+ },
589
+ {
590
+ "epoch": 0.049557522123893805,
591
+ "grad_norm": 13.278443336486816,
592
+ "learning_rate": 1.839622641509434e-07,
593
+ "loss": 4.6979,
594
+ "step": 56
595
+ },
596
+ {
597
+ "epoch": 0.050442477876106194,
598
+ "grad_norm": 15.295453071594238,
599
+ "learning_rate": 1.875e-07,
600
+ "loss": 3.1969,
601
+ "step": 57
602
+ },
603
+ {
604
+ "epoch": 0.05132743362831858,
605
+ "grad_norm": 12.185781478881836,
606
+ "learning_rate": 1.910377358490566e-07,
607
+ "loss": 4.4127,
608
+ "step": 58
609
+ },
610
+ {
611
+ "epoch": 0.05221238938053097,
612
+ "grad_norm": 10.874494552612305,
613
+ "learning_rate": 1.9457547169811322e-07,
614
+ "loss": 3.7746,
615
+ "step": 59
616
+ },
617
+ {
618
+ "epoch": 0.05309734513274336,
619
+ "grad_norm": 9.654823303222656,
620
+ "learning_rate": 1.9811320754716982e-07,
621
+ "loss": 4.5378,
622
+ "step": 60
623
+ },
624
+ {
625
+ "epoch": 0.05398230088495575,
626
+ "grad_norm": 21.123645782470703,
627
+ "learning_rate": 2.016509433962264e-07,
628
+ "loss": 5.0209,
629
+ "step": 61
630
+ },
631
+ {
632
+ "epoch": 0.05486725663716814,
633
+ "grad_norm": 33.47934341430664,
634
+ "learning_rate": 2.0518867924528303e-07,
635
+ "loss": 6.5936,
636
+ "step": 62
637
+ },
638
+ {
639
+ "epoch": 0.05575221238938053,
640
+ "grad_norm": 10.2566556930542,
641
+ "learning_rate": 2.0872641509433963e-07,
642
+ "loss": 4.2315,
643
+ "step": 63
644
+ },
645
+ {
646
+ "epoch": 0.05663716814159292,
647
+ "grad_norm": 28.198625564575195,
648
+ "learning_rate": 2.1226415094339622e-07,
649
+ "loss": 6.4269,
650
+ "step": 64
651
+ },
652
+ {
653
+ "epoch": 0.05752212389380531,
654
+ "grad_norm": 9.386558532714844,
655
+ "learning_rate": 2.1580188679245282e-07,
656
+ "loss": 4.2644,
657
+ "step": 65
658
+ },
659
+ {
660
+ "epoch": 0.0584070796460177,
661
+ "grad_norm": 12.687555313110352,
662
+ "learning_rate": 2.1933962264150944e-07,
663
+ "loss": 5.1388,
664
+ "step": 66
665
+ },
666
+ {
667
+ "epoch": 0.05929203539823009,
668
+ "grad_norm": 14.834878921508789,
669
+ "learning_rate": 2.2287735849056603e-07,
670
+ "loss": 5.1852,
671
+ "step": 67
672
+ },
673
+ {
674
+ "epoch": 0.06017699115044248,
675
+ "grad_norm": 10.888677597045898,
676
+ "learning_rate": 2.2641509433962263e-07,
677
+ "loss": 4.8057,
678
+ "step": 68
679
+ },
680
+ {
681
+ "epoch": 0.061061946902654866,
682
+ "grad_norm": 13.97256851196289,
683
+ "learning_rate": 2.2995283018867925e-07,
684
+ "loss": 3.1725,
685
+ "step": 69
686
+ },
687
+ {
688
+ "epoch": 0.061946902654867256,
689
+ "grad_norm": 11.82534122467041,
690
+ "learning_rate": 2.3349056603773584e-07,
691
+ "loss": 3.3322,
692
+ "step": 70
693
+ },
694
+ {
695
+ "epoch": 0.06283185840707965,
696
+ "grad_norm": 16.99266242980957,
697
+ "learning_rate": 2.3702830188679244e-07,
698
+ "loss": 5.139,
699
+ "step": 71
700
+ },
701
+ {
702
+ "epoch": 0.06371681415929203,
703
+ "grad_norm": 8.74513053894043,
704
+ "learning_rate": 2.4056603773584903e-07,
705
+ "loss": 4.307,
706
+ "step": 72
707
+ },
708
+ {
709
+ "epoch": 0.06460176991150443,
710
+ "grad_norm": 11.715869903564453,
711
+ "learning_rate": 2.4410377358490563e-07,
712
+ "loss": 5.0133,
713
+ "step": 73
714
+ },
715
+ {
716
+ "epoch": 0.06548672566371681,
717
+ "grad_norm": 9.844196319580078,
718
+ "learning_rate": 2.476415094339623e-07,
719
+ "loss": 4.0507,
720
+ "step": 74
721
+ },
722
+ {
723
+ "epoch": 0.06637168141592921,
724
+ "grad_norm": 12.447444915771484,
725
+ "learning_rate": 2.5117924528301887e-07,
726
+ "loss": 3.3895,
727
+ "step": 75
728
+ },
729
+ {
730
+ "epoch": 0.06725663716814159,
731
+ "grad_norm": 23.91596794128418,
732
+ "learning_rate": 2.5471698113207547e-07,
733
+ "loss": 5.6736,
734
+ "step": 76
735
+ },
736
+ {
737
+ "epoch": 0.06814159292035399,
738
+ "grad_norm": 9.635603904724121,
739
+ "learning_rate": 2.5825471698113206e-07,
740
+ "loss": 4.2572,
741
+ "step": 77
742
+ },
743
+ {
744
+ "epoch": 0.06902654867256637,
745
+ "grad_norm": 14.971665382385254,
746
+ "learning_rate": 2.6179245283018866e-07,
747
+ "loss": 3.0796,
748
+ "step": 78
749
+ },
750
+ {
751
+ "epoch": 0.06991150442477877,
752
+ "grad_norm": 11.226128578186035,
753
+ "learning_rate": 2.6533018867924525e-07,
754
+ "loss": 5.0199,
755
+ "step": 79
756
+ },
757
+ {
758
+ "epoch": 0.07079646017699115,
759
+ "grad_norm": 11.01388931274414,
760
+ "learning_rate": 2.688679245283019e-07,
761
+ "loss": 4.1414,
762
+ "step": 80
763
+ },
764
+ {
765
+ "epoch": 0.07079646017699115,
766
+ "eval_Qnli-dev_cosine_accuracy": 0.591796875,
767
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9258557558059692,
768
+ "eval_Qnli-dev_cosine_ap": 0.5585355274462735,
769
+ "eval_Qnli-dev_cosine_f1": 0.6291834002677376,
770
+ "eval_Qnli-dev_cosine_f1_threshold": 0.750666618347168,
771
+ "eval_Qnli-dev_cosine_precision": 0.4598825831702544,
772
+ "eval_Qnli-dev_cosine_recall": 0.9957627118644068,
773
+ "eval_Qnli-dev_dot_accuracy": 0.591796875,
774
+ "eval_Qnli-dev_dot_accuracy_threshold": 711.18359375,
775
+ "eval_Qnli-dev_dot_ap": 0.5585297234749824,
776
+ "eval_Qnli-dev_dot_f1": 0.6291834002677376,
777
+ "eval_Qnli-dev_dot_f1_threshold": 576.5970458984375,
778
+ "eval_Qnli-dev_dot_precision": 0.4598825831702544,
779
+ "eval_Qnli-dev_dot_recall": 0.9957627118644068,
780
+ "eval_Qnli-dev_euclidean_accuracy": 0.591796875,
781
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 10.672666549682617,
782
+ "eval_Qnli-dev_euclidean_ap": 0.5585355274462735,
783
+ "eval_Qnli-dev_euclidean_f1": 0.6291834002677376,
784
+ "eval_Qnli-dev_euclidean_f1_threshold": 19.553747177124023,
785
+ "eval_Qnli-dev_euclidean_precision": 0.4598825831702544,
786
+ "eval_Qnli-dev_euclidean_recall": 0.9957627118644068,
787
+ "eval_Qnli-dev_manhattan_accuracy": 0.619140625,
788
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 188.09068298339844,
789
+ "eval_Qnli-dev_manhattan_ap": 0.5898283705050701,
790
+ "eval_Qnli-dev_manhattan_f1": 0.6301775147928994,
791
+ "eval_Qnli-dev_manhattan_f1_threshold": 237.80462646484375,
792
+ "eval_Qnli-dev_manhattan_precision": 0.48409090909090907,
793
+ "eval_Qnli-dev_manhattan_recall": 0.902542372881356,
794
+ "eval_Qnli-dev_max_accuracy": 0.619140625,
795
+ "eval_Qnli-dev_max_accuracy_threshold": 711.18359375,
796
+ "eval_Qnli-dev_max_ap": 0.5898283705050701,
797
+ "eval_Qnli-dev_max_f1": 0.6301775147928994,
798
+ "eval_Qnli-dev_max_f1_threshold": 576.5970458984375,
799
+ "eval_Qnli-dev_max_precision": 0.48409090909090907,
800
+ "eval_Qnli-dev_max_recall": 0.9957627118644068,
801
+ "eval_allNLI-dev_cosine_accuracy": 0.666015625,
802
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.983686089515686,
803
+ "eval_allNLI-dev_cosine_ap": 0.34411819659341086,
804
+ "eval_allNLI-dev_cosine_f1": 0.5065885797950219,
805
+ "eval_allNLI-dev_cosine_f1_threshold": 0.7642872333526611,
806
+ "eval_allNLI-dev_cosine_precision": 0.3392156862745098,
807
+ "eval_allNLI-dev_cosine_recall": 1.0,
808
+ "eval_allNLI-dev_dot_accuracy": 0.666015625,
809
+ "eval_allNLI-dev_dot_accuracy_threshold": 755.60302734375,
810
+ "eval_allNLI-dev_dot_ap": 0.344109544232086,
811
+ "eval_allNLI-dev_dot_f1": 0.5065885797950219,
812
+ "eval_allNLI-dev_dot_f1_threshold": 587.0625,
813
+ "eval_allNLI-dev_dot_precision": 0.3392156862745098,
814
+ "eval_allNLI-dev_dot_recall": 1.0,
815
+ "eval_allNLI-dev_euclidean_accuracy": 0.666015625,
816
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 5.00581693649292,
817
+ "eval_allNLI-dev_euclidean_ap": 0.3441246898925644,
818
+ "eval_allNLI-dev_euclidean_f1": 0.5065885797950219,
819
+ "eval_allNLI-dev_euclidean_f1_threshold": 19.022436141967773,
820
+ "eval_allNLI-dev_euclidean_precision": 0.3392156862745098,
821
+ "eval_allNLI-dev_euclidean_recall": 1.0,
822
+ "eval_allNLI-dev_manhattan_accuracy": 0.6640625,
823
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 62.69102096557617,
824
+ "eval_allNLI-dev_manhattan_ap": 0.35131239981425566,
825
+ "eval_allNLI-dev_manhattan_f1": 0.5058479532163743,
826
+ "eval_allNLI-dev_manhattan_f1_threshold": 337.6861877441406,
827
+ "eval_allNLI-dev_manhattan_precision": 0.3385518590998043,
828
+ "eval_allNLI-dev_manhattan_recall": 1.0,
829
+ "eval_allNLI-dev_max_accuracy": 0.666015625,
830
+ "eval_allNLI-dev_max_accuracy_threshold": 755.60302734375,
831
+ "eval_allNLI-dev_max_ap": 0.35131239981425566,
832
+ "eval_allNLI-dev_max_f1": 0.5065885797950219,
833
+ "eval_allNLI-dev_max_f1_threshold": 587.0625,
834
+ "eval_allNLI-dev_max_precision": 0.3392156862745098,
835
+ "eval_allNLI-dev_max_recall": 1.0,
836
+ "eval_sequential_score": 0.5898283705050701,
837
+ "eval_sts-test_pearson_cosine": 0.22248205020578934,
838
+ "eval_sts-test_pearson_dot": 0.22239084967931927,
839
+ "eval_sts-test_pearson_euclidean": 0.2323160413842197,
840
+ "eval_sts-test_pearson_manhattan": 0.26632593273308647,
841
+ "eval_sts-test_pearson_max": 0.26632593273308647,
842
+ "eval_sts-test_spearman_cosine": 0.24802235964390085,
843
+ "eval_sts-test_spearman_dot": 0.24791612015173234,
844
+ "eval_sts-test_spearman_euclidean": 0.24799036249272113,
845
+ "eval_sts-test_spearman_manhattan": 0.2843623073856928,
846
+ "eval_sts-test_spearman_max": 0.2843623073856928,
847
+ "eval_vitaminc-pairs_loss": 2.7793872356414795,
848
+ "eval_vitaminc-pairs_runtime": 3.7649,
849
+ "eval_vitaminc-pairs_samples_per_second": 33.998,
850
+ "eval_vitaminc-pairs_steps_per_second": 0.266,
851
+ "step": 80
852
+ },
853
+ {
854
+ "epoch": 0.07079646017699115,
855
+ "eval_negation-triplets_loss": 4.888970851898193,
856
+ "eval_negation-triplets_runtime": 0.7134,
857
+ "eval_negation-triplets_samples_per_second": 179.432,
858
+ "eval_negation-triplets_steps_per_second": 1.402,
859
+ "step": 80
860
+ },
861
+ {
862
+ "epoch": 0.07079646017699115,
863
+ "eval_scitail-pairs-pos_loss": 1.8996644020080566,
864
+ "eval_scitail-pairs-pos_runtime": 0.8506,
865
+ "eval_scitail-pairs-pos_samples_per_second": 150.477,
866
+ "eval_scitail-pairs-pos_steps_per_second": 1.176,
867
+ "step": 80
868
+ },
869
+ {
870
+ "epoch": 0.07079646017699115,
871
+ "eval_scitail-pairs-qa_loss": 2.6760551929473877,
872
+ "eval_scitail-pairs-qa_runtime": 0.5685,
873
+ "eval_scitail-pairs-qa_samples_per_second": 225.171,
874
+ "eval_scitail-pairs-qa_steps_per_second": 1.759,
875
+ "step": 80
876
+ },
877
+ {
878
+ "epoch": 0.07079646017699115,
879
+ "eval_xsum-pairs_loss": 6.209648609161377,
880
+ "eval_xsum-pairs_runtime": 2.9221,
881
+ "eval_xsum-pairs_samples_per_second": 43.804,
882
+ "eval_xsum-pairs_steps_per_second": 0.342,
883
+ "step": 80
884
+ },
885
+ {
886
+ "epoch": 0.07079646017699115,
887
+ "eval_sciq_pairs_loss": 0.7622462511062622,
888
+ "eval_sciq_pairs_runtime": 3.7816,
889
+ "eval_sciq_pairs_samples_per_second": 33.848,
890
+ "eval_sciq_pairs_steps_per_second": 0.264,
891
+ "step": 80
892
+ },
893
+ {
894
+ "epoch": 0.07079646017699115,
895
+ "eval_qasc_pairs_loss": 3.3129472732543945,
896
+ "eval_qasc_pairs_runtime": 0.6761,
897
+ "eval_qasc_pairs_samples_per_second": 189.334,
898
+ "eval_qasc_pairs_steps_per_second": 1.479,
899
+ "step": 80
900
+ },
901
+ {
902
+ "epoch": 0.07079646017699115,
903
+ "eval_openbookqa_pairs_loss": 4.549765586853027,
904
+ "eval_openbookqa_pairs_runtime": 0.5767,
905
+ "eval_openbookqa_pairs_samples_per_second": 221.954,
906
+ "eval_openbookqa_pairs_steps_per_second": 1.734,
907
+ "step": 80
908
+ },
909
+ {
910
+ "epoch": 0.07079646017699115,
911
+ "eval_msmarco_pairs_loss": 7.205582141876221,
912
+ "eval_msmarco_pairs_runtime": 1.2621,
913
+ "eval_msmarco_pairs_samples_per_second": 101.416,
914
+ "eval_msmarco_pairs_steps_per_second": 0.792,
915
+ "step": 80
916
+ },
917
+ {
918
+ "epoch": 0.07079646017699115,
919
+ "eval_nq_pairs_loss": 7.680945873260498,
920
+ "eval_nq_pairs_runtime": 2.5052,
921
+ "eval_nq_pairs_samples_per_second": 51.095,
922
+ "eval_nq_pairs_steps_per_second": 0.399,
923
+ "step": 80
924
+ },
925
+ {
926
+ "epoch": 0.07079646017699115,
927
+ "eval_trivia_pairs_loss": 6.37924861907959,
928
+ "eval_trivia_pairs_runtime": 3.6293,
929
+ "eval_trivia_pairs_samples_per_second": 35.268,
930
+ "eval_trivia_pairs_steps_per_second": 0.276,
931
+ "step": 80
932
+ },
933
+ {
934
+ "epoch": 0.07079646017699115,
935
+ "eval_gooaq_pairs_loss": 6.656675338745117,
936
+ "eval_gooaq_pairs_runtime": 0.9698,
937
+ "eval_gooaq_pairs_samples_per_second": 131.988,
938
+ "eval_gooaq_pairs_steps_per_second": 1.031,
939
+ "step": 80
940
+ },
941
+ {
942
+ "epoch": 0.07079646017699115,
943
+ "eval_paws-pos_loss": 1.3848179578781128,
944
+ "eval_paws-pos_runtime": 0.6727,
945
+ "eval_paws-pos_samples_per_second": 190.278,
946
+ "eval_paws-pos_steps_per_second": 1.487,
947
+ "step": 80
948
+ },
949
+ {
950
+ "epoch": 0.07079646017699115,
951
+ "eval_global_dataset_loss": 5.002967834472656,
952
+ "eval_global_dataset_runtime": 23.048,
953
+ "eval_global_dataset_samples_per_second": 28.766,
954
+ "eval_global_dataset_steps_per_second": 0.26,
955
+ "step": 80
956
+ }
957
+ ],
958
+ "logging_steps": 1,
959
+ "max_steps": 3390,
960
+ "num_input_tokens_seen": 0,
961
+ "num_train_epochs": 3,
962
+ "save_steps": 80,
963
+ "stateful_callbacks": {
964
+ "TrainerControl": {
965
+ "args": {
966
+ "should_epoch_stop": false,
967
+ "should_evaluate": false,
968
+ "should_log": false,
969
+ "should_save": true,
970
+ "should_training_stop": false
971
+ },
972
+ "attributes": {}
973
+ }
974
+ },
975
+ "total_flos": 0.0,
976
+ "train_batch_size": 42,
977
+ "trial_name": null,
978
+ "trial_params": null
979
+ }
checkpoint-80/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bfb21b1a8b0022475cba81f0306eaa079a06c682d78c599327457cfd397d216
3
+ size 5688