bobox commited on
Commit
d0b63d2
·
verified ·
1 Parent(s): 00c193f

Training in progress, step 305, checkpoint

Browse files
checkpoint-305/1_AdvancedWeightedPooling/config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 768,
3
+ "num_heads": 4,
4
+ "dropout": 0.025,
5
+ "bias": true,
6
+ "gate_min": 0.05,
7
+ "gate_max": 0.95,
8
+ "gate_dropout": 0.01,
9
+ "dropout_gate_open": 0.075,
10
+ "dropout_gate_close": 0.05,
11
+ "CLS_self_attn": 0
12
+ }
checkpoint-305/1_AdvancedWeightedPooling/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f1a56dcfcfbff23630b549e575f6ec58439394ba18910f67aaa7762af6f7270
3
+ size 18940723
checkpoint-305/README.md ADDED
@@ -0,0 +1,1174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/deberta-v3-small
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - pearson_cosine
6
+ - spearman_cosine
7
+ - pearson_manhattan
8
+ - spearman_manhattan
9
+ - pearson_euclidean
10
+ - spearman_euclidean
11
+ - pearson_dot
12
+ - spearman_dot
13
+ - pearson_max
14
+ - spearman_max
15
+ - cosine_accuracy
16
+ - cosine_accuracy_threshold
17
+ - cosine_f1
18
+ - cosine_f1_threshold
19
+ - cosine_precision
20
+ - cosine_recall
21
+ - cosine_ap
22
+ - dot_accuracy
23
+ - dot_accuracy_threshold
24
+ - dot_f1
25
+ - dot_f1_threshold
26
+ - dot_precision
27
+ - dot_recall
28
+ - dot_ap
29
+ - manhattan_accuracy
30
+ - manhattan_accuracy_threshold
31
+ - manhattan_f1
32
+ - manhattan_f1_threshold
33
+ - manhattan_precision
34
+ - manhattan_recall
35
+ - manhattan_ap
36
+ - euclidean_accuracy
37
+ - euclidean_accuracy_threshold
38
+ - euclidean_f1
39
+ - euclidean_f1_threshold
40
+ - euclidean_precision
41
+ - euclidean_recall
42
+ - euclidean_ap
43
+ - max_accuracy
44
+ - max_accuracy_threshold
45
+ - max_f1
46
+ - max_f1_threshold
47
+ - max_precision
48
+ - max_recall
49
+ - max_ap
50
+ pipeline_tag: sentence-similarity
51
+ tags:
52
+ - sentence-transformers
53
+ - sentence-similarity
54
+ - feature-extraction
55
+ - generated_from_trainer
56
+ - dataset_size:32500
57
+ - loss:GISTEmbedLoss
58
+ widget:
59
+ - source_sentence: What was the name of Jed's nephew in The Beverly Hillbillies?
60
+ sentences:
61
+ - Jed Clampett - The Beverly Hillbillies Characters - ShareTV Buddy Ebsen began
62
+ his career as a dancer in the late 1920s in a Broadway chorus. He later formed
63
+ a vaudeville ... Character Bio Although he had received little formal education,
64
+ Jed Clampett had a good deal of common sense. A good-natured man, he is the apparent
65
+ head of the family. Jed's wife (Elly May's mother) died, but is referred to in
66
+ the episode "Duke Steals A Wife" as Rose Ellen. Jed was shown to be an expert
67
+ marksman and was extremely loyal to his family and kinfolk. The huge oil pool
68
+ in the swamp he owned was the beginning of his rags-to-riches journey to Beverly
69
+ Hills. Although he longed for the old ways back in the hills, he made the best
70
+ of being in Beverly Hills. Whenever he had anything on his mind, he would sit
71
+ on the curbstone of his mansion and whittle until he came up with the answer.
72
+ Jedediah, the version of Jed's name used in the 1993 Beverly Hillbillies theatrical
73
+ movie, was never mentioned in the original television series (though coincidentally,
74
+ on Ebsen's subsequent series, Barnaby Jones, Barnaby's nephew J.R. was also named
75
+ Jedediah). In one episode Jed and Granny reminisce about seeing Buddy Ebsen and
76
+ Vilma Ebsen—a joking reference to the Ebsens' song and dance act. Jed appears
77
+ in all 274 episodes. Episode Screenshots
78
+ - a stove generates heat for cooking usually
79
+ - Miss Marple series by Agatha Christie Miss Marple series 43 works, 13 primary
80
+ works Mystery series in order of publication. Miss Marple is introduced in The
81
+ Murder at the Vicarage but the books can be read in any order. Mixed short story
82
+ collections are included if some are Marple, often have horror, supernatural,
83
+ maybe detective Poirot, Pyne, or Quin. Note that "Nemesis" should be read AFTER
84
+ "A Caribbean Holiday"
85
+ - source_sentence: A recording of folk songs done for the Columbia society in 1942
86
+ was largely arranged by Pjetër Dungu .
87
+ sentences:
88
+ - Someone cooking drugs in a spoon over a candle
89
+ - A recording of folk songs made for the Columbia society in 1942 was largely arranged
90
+ by Pjetër Dungu .
91
+ - A Murder of Crows, A Parliament of Owls What do You Call a Group of Birds? Do
92
+ you know what a group of Ravens is called? What about a group of peacocks, snipe
93
+ or hummingbirds? Here is a list of Bird Collectives, terms that you can use to
94
+ describe a group of birds. Birds in general
95
+ - source_sentence: A person in a kitchen looking at the oven.
96
+ sentences:
97
+ - "staying warm has a positive impact on an animal 's survival. Furry animals grow\
98
+ \ thicker coats to keep warm in the winter. \n Furry animals grow thicker coats\
99
+ \ which has a positive impact on their survival. "
100
+ - A woman In the kitchen opening her oven.
101
+ - EE has apologised after a fault left some of its customers unable to use the internet
102
+ on their mobile devices.
103
+ - source_sentence: Air can be separated into several elements.
104
+ sentences:
105
+ - Which of the following substances can be separated into several elements?
106
+ - 'Funny Interesting Facts Humor Strange: Carl and the Passions changed band name
107
+ to what Carl and the Passions changed band name to what Beach Boys Carl and the
108
+ Passions - "So Tough" is the fifteenth studio album released by The Beach Boys
109
+ in 1972. In its initial release, it was the second disc of a two-album set with
110
+ Pet Sounds (which The Beach Boys were able to license from Capitol Records). Unfortunately,
111
+ due to the fact that Carl and the Passions - "So Tough" was a transitional album
112
+ that saw the departure of one member and the introduction of two new ones, making
113
+ it wildly inconsistent in terms of type of material present, it paled next to
114
+ their 1966 classic and was seen as something of a disappointment in its time of
115
+ release. The title of the album itself was a reference to an early band Carl Wilson
116
+ had been in as a teenager (some say a possible early name for the Beach Boys).
117
+ It was also the first album released under a new deal with Warner Bros. that allowed
118
+ the company to distribute all future Beach Boys product in foreign as well as
119
+ domestic markets.'
120
+ - Which statement correctly describes a relationship between two human body systems?
121
+ - source_sentence: What do outdoor plants require to survive?
122
+ sentences:
123
+ - "a plants require water for survival. If no rain or watering, the plant dies.\
124
+ \ \n Outdoor plants require rain to survive."
125
+ - (Vegan) soups are nutritious. In addition to them being easy to digest, most the
126
+ time, soups are made from nutrient-dense ingredients like herbs, spices, vegetables,
127
+ and beans. Because the soup is full of those nutrients AND that it's easy to digest,
128
+ your body is able to absorb more of those nutrients into your system.
129
+ - If you do the math, there are 11,238,513 possible combinations of five white balls
130
+ (without order mattering). Multiply that by the 26 possible red balls, and you
131
+ get 292,201,338 possible Powerball number combinations. At $2 per ticket, you'd
132
+ need $584,402,676 to buy every single combination and guarantee a win.
133
+ model-index:
134
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
135
+ results:
136
+ - task:
137
+ type: semantic-similarity
138
+ name: Semantic Similarity
139
+ dataset:
140
+ name: sts test
141
+ type: sts-test
142
+ metrics:
143
+ - type: pearson_cosine
144
+ value: 0.12009124140478655
145
+ name: Pearson Cosine
146
+ - type: spearman_cosine
147
+ value: 0.180573622028628
148
+ name: Spearman Cosine
149
+ - type: pearson_manhattan
150
+ value: 0.18492770691981375
151
+ name: Pearson Manhattan
152
+ - type: spearman_manhattan
153
+ value: 0.21139381574888486
154
+ name: Spearman Manhattan
155
+ - type: pearson_euclidean
156
+ value: 0.15529980522625675
157
+ name: Pearson Euclidean
158
+ - type: spearman_euclidean
159
+ value: 0.18058248277838349
160
+ name: Spearman Euclidean
161
+ - type: pearson_dot
162
+ value: 0.11997652374043644
163
+ name: Pearson Dot
164
+ - type: spearman_dot
165
+ value: 0.18041242798509616
166
+ name: Spearman Dot
167
+ - type: pearson_max
168
+ value: 0.18492770691981375
169
+ name: Pearson Max
170
+ - type: spearman_max
171
+ value: 0.21139381574888486
172
+ name: Spearman Max
173
+ - task:
174
+ type: binary-classification
175
+ name: Binary Classification
176
+ dataset:
177
+ name: allNLI dev
178
+ type: allNLI-dev
179
+ metrics:
180
+ - type: cosine_accuracy
181
+ value: 0.66796875
182
+ name: Cosine Accuracy
183
+ - type: cosine_accuracy_threshold
184
+ value: 0.9721524119377136
185
+ name: Cosine Accuracy Threshold
186
+ - type: cosine_f1
187
+ value: 0.5029239766081871
188
+ name: Cosine F1
189
+ - type: cosine_f1_threshold
190
+ value: 0.821484386920929
191
+ name: Cosine F1 Threshold
192
+ - type: cosine_precision
193
+ value: 0.33659491193737767
194
+ name: Cosine Precision
195
+ - type: cosine_recall
196
+ value: 0.9942196531791907
197
+ name: Cosine Recall
198
+ - type: cosine_ap
199
+ value: 0.3857994503224615
200
+ name: Cosine Ap
201
+ - type: dot_accuracy
202
+ value: 0.66796875
203
+ name: Dot Accuracy
204
+ - type: dot_accuracy_threshold
205
+ value: 746.914794921875
206
+ name: Dot Accuracy Threshold
207
+ - type: dot_f1
208
+ value: 0.5029239766081871
209
+ name: Dot F1
210
+ - type: dot_f1_threshold
211
+ value: 631.138916015625
212
+ name: Dot F1 Threshold
213
+ - type: dot_precision
214
+ value: 0.33659491193737767
215
+ name: Dot Precision
216
+ - type: dot_recall
217
+ value: 0.9942196531791907
218
+ name: Dot Recall
219
+ - type: dot_ap
220
+ value: 0.38572844452312516
221
+ name: Dot Ap
222
+ - type: manhattan_accuracy
223
+ value: 0.666015625
224
+ name: Manhattan Accuracy
225
+ - type: manhattan_accuracy_threshold
226
+ value: 95.24527740478516
227
+ name: Manhattan Accuracy Threshold
228
+ - type: manhattan_f1
229
+ value: 0.5045317220543807
230
+ name: Manhattan F1
231
+ - type: manhattan_f1_threshold
232
+ value: 254.973388671875
233
+ name: Manhattan F1 Threshold
234
+ - type: manhattan_precision
235
+ value: 0.34151329243353784
236
+ name: Manhattan Precision
237
+ - type: manhattan_recall
238
+ value: 0.9653179190751445
239
+ name: Manhattan Recall
240
+ - type: manhattan_ap
241
+ value: 0.39193409293721965
242
+ name: Manhattan Ap
243
+ - type: euclidean_accuracy
244
+ value: 0.66796875
245
+ name: Euclidean Accuracy
246
+ - type: euclidean_accuracy_threshold
247
+ value: 6.541449546813965
248
+ name: Euclidean Accuracy Threshold
249
+ - type: euclidean_f1
250
+ value: 0.5029239766081871
251
+ name: Euclidean F1
252
+ - type: euclidean_f1_threshold
253
+ value: 16.558998107910156
254
+ name: Euclidean F1 Threshold
255
+ - type: euclidean_precision
256
+ value: 0.33659491193737767
257
+ name: Euclidean Precision
258
+ - type: euclidean_recall
259
+ value: 0.9942196531791907
260
+ name: Euclidean Recall
261
+ - type: euclidean_ap
262
+ value: 0.3858031188548441
263
+ name: Euclidean Ap
264
+ - type: max_accuracy
265
+ value: 0.66796875
266
+ name: Max Accuracy
267
+ - type: max_accuracy_threshold
268
+ value: 746.914794921875
269
+ name: Max Accuracy Threshold
270
+ - type: max_f1
271
+ value: 0.5045317220543807
272
+ name: Max F1
273
+ - type: max_f1_threshold
274
+ value: 631.138916015625
275
+ name: Max F1 Threshold
276
+ - type: max_precision
277
+ value: 0.34151329243353784
278
+ name: Max Precision
279
+ - type: max_recall
280
+ value: 0.9942196531791907
281
+ name: Max Recall
282
+ - type: max_ap
283
+ value: 0.39193409293721965
284
+ name: Max Ap
285
+ - task:
286
+ type: binary-classification
287
+ name: Binary Classification
288
+ dataset:
289
+ name: Qnli dev
290
+ type: Qnli-dev
291
+ metrics:
292
+ - type: cosine_accuracy
293
+ value: 0.58203125
294
+ name: Cosine Accuracy
295
+ - type: cosine_accuracy_threshold
296
+ value: 0.9368094801902771
297
+ name: Cosine Accuracy Threshold
298
+ - type: cosine_f1
299
+ value: 0.6300268096514745
300
+ name: Cosine F1
301
+ - type: cosine_f1_threshold
302
+ value: 0.802739143371582
303
+ name: Cosine F1 Threshold
304
+ - type: cosine_precision
305
+ value: 0.46078431372549017
306
+ name: Cosine Precision
307
+ - type: cosine_recall
308
+ value: 0.9957627118644068
309
+ name: Cosine Recall
310
+ - type: cosine_ap
311
+ value: 0.5484497034083067
312
+ name: Cosine Ap
313
+ - type: dot_accuracy
314
+ value: 0.58203125
315
+ name: Dot Accuracy
316
+ - type: dot_accuracy_threshold
317
+ value: 719.7518310546875
318
+ name: Dot Accuracy Threshold
319
+ - type: dot_f1
320
+ value: 0.6300268096514745
321
+ name: Dot F1
322
+ - type: dot_f1_threshold
323
+ value: 616.7227783203125
324
+ name: Dot F1 Threshold
325
+ - type: dot_precision
326
+ value: 0.46078431372549017
327
+ name: Dot Precision
328
+ - type: dot_recall
329
+ value: 0.9957627118644068
330
+ name: Dot Recall
331
+ - type: dot_ap
332
+ value: 0.548461685358088
333
+ name: Dot Ap
334
+ - type: manhattan_accuracy
335
+ value: 0.607421875
336
+ name: Manhattan Accuracy
337
+ - type: manhattan_accuracy_threshold
338
+ value: 182.1275177001953
339
+ name: Manhattan Accuracy Threshold
340
+ - type: manhattan_f1
341
+ value: 0.6303724928366763
342
+ name: Manhattan F1
343
+ - type: manhattan_f1_threshold
344
+ value: 230.0565185546875
345
+ name: Manhattan F1 Threshold
346
+ - type: manhattan_precision
347
+ value: 0.47619047619047616
348
+ name: Manhattan Precision
349
+ - type: manhattan_recall
350
+ value: 0.9322033898305084
351
+ name: Manhattan Recall
352
+ - type: manhattan_ap
353
+ value: 0.5750034744442096
354
+ name: Manhattan Ap
355
+ - type: euclidean_accuracy
356
+ value: 0.58203125
357
+ name: Euclidean Accuracy
358
+ - type: euclidean_accuracy_threshold
359
+ value: 9.853867530822754
360
+ name: Euclidean Accuracy Threshold
361
+ - type: euclidean_f1
362
+ value: 0.6300268096514745
363
+ name: Euclidean F1
364
+ - type: euclidean_f1_threshold
365
+ value: 17.40953254699707
366
+ name: Euclidean F1 Threshold
367
+ - type: euclidean_precision
368
+ value: 0.46078431372549017
369
+ name: Euclidean Precision
370
+ - type: euclidean_recall
371
+ value: 0.9957627118644068
372
+ name: Euclidean Recall
373
+ - type: euclidean_ap
374
+ value: 0.5484497034083067
375
+ name: Euclidean Ap
376
+ - type: max_accuracy
377
+ value: 0.607421875
378
+ name: Max Accuracy
379
+ - type: max_accuracy_threshold
380
+ value: 719.7518310546875
381
+ name: Max Accuracy Threshold
382
+ - type: max_f1
383
+ value: 0.6303724928366763
384
+ name: Max F1
385
+ - type: max_f1_threshold
386
+ value: 616.7227783203125
387
+ name: Max F1 Threshold
388
+ - type: max_precision
389
+ value: 0.47619047619047616
390
+ name: Max Precision
391
+ - type: max_recall
392
+ value: 0.9957627118644068
393
+ name: Max Recall
394
+ - type: max_ap
395
+ value: 0.5750034744442096
396
+ name: Max Ap
397
+ ---
398
+
399
+ # SentenceTransformer based on microsoft/deberta-v3-small
400
+
401
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
402
+
403
+ ## Model Details
404
+
405
+ ### Model Description
406
+ - **Model Type:** Sentence Transformer
407
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
408
+ - **Maximum Sequence Length:** 512 tokens
409
+ - **Output Dimensionality:** 768 tokens
410
+ - **Similarity Function:** Cosine Similarity
411
+ <!-- - **Training Dataset:** Unknown -->
412
+ <!-- - **Language:** Unknown -->
413
+ <!-- - **License:** Unknown -->
414
+
415
+ ### Model Sources
416
+
417
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
418
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
419
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
420
+
421
+ ### Full Model Architecture
422
+
423
+ ```
424
+ SentenceTransformer(
425
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
426
+ (1): AdvancedWeightedPooling(
427
+ (alpha_dropout_layer): Dropout(p=0.01, inplace=False)
428
+ (gate_dropout_layer): Dropout(p=0.05, inplace=False)
429
+ (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
430
+ (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
431
+ (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
432
+ (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
433
+ (mha): MultiheadAttention(
434
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
435
+ )
436
+ (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
437
+ (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
438
+ (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
439
+ (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
440
+ (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
441
+ )
442
+ )
443
+ ```
444
+
445
+ ## Usage
446
+
447
+ ### Direct Usage (Sentence Transformers)
448
+
449
+ First install the Sentence Transformers library:
450
+
451
+ ```bash
452
+ pip install -U sentence-transformers
453
+ ```
454
+
455
+ Then you can load this model and run inference.
456
+ ```python
457
+ from sentence_transformers import SentenceTransformer
458
+
459
+ # Download from the 🤗 Hub
460
+ model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest2-step1-checkpoints-tmp")
461
+ # Run inference
462
+ sentences = [
463
+ 'What do outdoor plants require to survive?',
464
+ 'a plants require water for survival. If no rain or watering, the plant dies. \n Outdoor plants require rain to survive.',
465
+ "(Vegan) soups are nutritious. In addition to them being easy to digest, most the time, soups are made from nutrient-dense ingredients like herbs, spices, vegetables, and beans. Because the soup is full of those nutrients AND that it's easy to digest, your body is able to absorb more of those nutrients into your system.",
466
+ ]
467
+ embeddings = model.encode(sentences)
468
+ print(embeddings.shape)
469
+ # [3, 768]
470
+
471
+ # Get the similarity scores for the embeddings
472
+ similarities = model.similarity(embeddings, embeddings)
473
+ print(similarities.shape)
474
+ # [3, 3]
475
+ ```
476
+
477
+ <!--
478
+ ### Direct Usage (Transformers)
479
+
480
+ <details><summary>Click to see the direct usage in Transformers</summary>
481
+
482
+ </details>
483
+ -->
484
+
485
+ <!--
486
+ ### Downstream Usage (Sentence Transformers)
487
+
488
+ You can finetune this model on your own dataset.
489
+
490
+ <details><summary>Click to expand</summary>
491
+
492
+ </details>
493
+ -->
494
+
495
+ <!--
496
+ ### Out-of-Scope Use
497
+
498
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
499
+ -->
500
+
501
+ ## Evaluation
502
+
503
+ ### Metrics
504
+
505
+ #### Semantic Similarity
506
+ * Dataset: `sts-test`
507
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
508
+
509
+ | Metric | Value |
510
+ |:--------------------|:-----------|
511
+ | pearson_cosine | 0.1201 |
512
+ | **spearman_cosine** | **0.1806** |
513
+ | pearson_manhattan | 0.1849 |
514
+ | spearman_manhattan | 0.2114 |
515
+ | pearson_euclidean | 0.1553 |
516
+ | spearman_euclidean | 0.1806 |
517
+ | pearson_dot | 0.12 |
518
+ | spearman_dot | 0.1804 |
519
+ | pearson_max | 0.1849 |
520
+ | spearman_max | 0.2114 |
521
+
522
+ #### Binary Classification
523
+ * Dataset: `allNLI-dev`
524
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
525
+
526
+ | Metric | Value |
527
+ |:-----------------------------|:-----------|
528
+ | cosine_accuracy | 0.668 |
529
+ | cosine_accuracy_threshold | 0.9722 |
530
+ | cosine_f1 | 0.5029 |
531
+ | cosine_f1_threshold | 0.8215 |
532
+ | cosine_precision | 0.3366 |
533
+ | cosine_recall | 0.9942 |
534
+ | cosine_ap | 0.3858 |
535
+ | dot_accuracy | 0.668 |
536
+ | dot_accuracy_threshold | 746.9148 |
537
+ | dot_f1 | 0.5029 |
538
+ | dot_f1_threshold | 631.1389 |
539
+ | dot_precision | 0.3366 |
540
+ | dot_recall | 0.9942 |
541
+ | dot_ap | 0.3857 |
542
+ | manhattan_accuracy | 0.666 |
543
+ | manhattan_accuracy_threshold | 95.2453 |
544
+ | manhattan_f1 | 0.5045 |
545
+ | manhattan_f1_threshold | 254.9734 |
546
+ | manhattan_precision | 0.3415 |
547
+ | manhattan_recall | 0.9653 |
548
+ | manhattan_ap | 0.3919 |
549
+ | euclidean_accuracy | 0.668 |
550
+ | euclidean_accuracy_threshold | 6.5414 |
551
+ | euclidean_f1 | 0.5029 |
552
+ | euclidean_f1_threshold | 16.559 |
553
+ | euclidean_precision | 0.3366 |
554
+ | euclidean_recall | 0.9942 |
555
+ | euclidean_ap | 0.3858 |
556
+ | max_accuracy | 0.668 |
557
+ | max_accuracy_threshold | 746.9148 |
558
+ | max_f1 | 0.5045 |
559
+ | max_f1_threshold | 631.1389 |
560
+ | max_precision | 0.3415 |
561
+ | max_recall | 0.9942 |
562
+ | **max_ap** | **0.3919** |
563
+
564
+ #### Binary Classification
565
+ * Dataset: `Qnli-dev`
566
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
567
+
568
+ | Metric | Value |
569
+ |:-----------------------------|:----------|
570
+ | cosine_accuracy | 0.582 |
571
+ | cosine_accuracy_threshold | 0.9368 |
572
+ | cosine_f1 | 0.63 |
573
+ | cosine_f1_threshold | 0.8027 |
574
+ | cosine_precision | 0.4608 |
575
+ | cosine_recall | 0.9958 |
576
+ | cosine_ap | 0.5484 |
577
+ | dot_accuracy | 0.582 |
578
+ | dot_accuracy_threshold | 719.7518 |
579
+ | dot_f1 | 0.63 |
580
+ | dot_f1_threshold | 616.7228 |
581
+ | dot_precision | 0.4608 |
582
+ | dot_recall | 0.9958 |
583
+ | dot_ap | 0.5485 |
584
+ | manhattan_accuracy | 0.6074 |
585
+ | manhattan_accuracy_threshold | 182.1275 |
586
+ | manhattan_f1 | 0.6304 |
587
+ | manhattan_f1_threshold | 230.0565 |
588
+ | manhattan_precision | 0.4762 |
589
+ | manhattan_recall | 0.9322 |
590
+ | manhattan_ap | 0.575 |
591
+ | euclidean_accuracy | 0.582 |
592
+ | euclidean_accuracy_threshold | 9.8539 |
593
+ | euclidean_f1 | 0.63 |
594
+ | euclidean_f1_threshold | 17.4095 |
595
+ | euclidean_precision | 0.4608 |
596
+ | euclidean_recall | 0.9958 |
597
+ | euclidean_ap | 0.5484 |
598
+ | max_accuracy | 0.6074 |
599
+ | max_accuracy_threshold | 719.7518 |
600
+ | max_f1 | 0.6304 |
601
+ | max_f1_threshold | 616.7228 |
602
+ | max_precision | 0.4762 |
603
+ | max_recall | 0.9958 |
604
+ | **max_ap** | **0.575** |
605
+
606
+ <!--
607
+ ## Bias, Risks and Limitations
608
+
609
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
610
+ -->
611
+
612
+ <!--
613
+ ### Recommendations
614
+
615
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
616
+ -->
617
+
618
+ ## Training Details
619
+
620
+ ### Training Dataset
621
+
622
+ #### Unnamed Dataset
623
+
624
+
625
+ * Size: 32,500 training samples
626
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
627
+ * Approximate statistics based on the first 1000 samples:
628
+ | | sentence1 | sentence2 |
629
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
630
+ | type | string | string |
631
+ | details | <ul><li>min: 4 tokens</li><li>mean: 29.43 tokens</li><li>max: 400 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 57.02 tokens</li><li>max: 389 tokens</li></ul> |
632
+ * Samples:
633
+ | sentence1 | sentence2 |
634
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
635
+ | <code>What is the chemical symbol for Silver?</code> | <code>Chemical Elements.com - Silver (Ag) Bentor, Yinon. Chemical Element.com - Silver. <http://www.chemicalelements.com/elements/ag.html>. For more information about citing online sources, please visit the MLA's Website . This page was created by Yinon Bentor. Use of this web site is restricted by this site's license agreement . Copyright © 1996-2012 Yinon Bentor. All Rights Reserved.</code> |
636
+ | <code>e.&#9;in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently.</code> | <code>Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas</code> |
637
+ | <code>Keanu Neal was born in 1995 .</code> | <code>Keanu Neal ( born July 26 , 1995 ) is an American football safety for the Atlanta Falcons of the National Football League ( NFL ) .</code> |
638
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
639
+ ```json
640
+ {'guide': SentenceTransformer(
641
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
642
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
643
+ (2): Normalize()
644
+ ), 'temperature': 0.025}
645
+ ```
646
+
647
+ ### Evaluation Dataset
648
+
649
+ #### Unnamed Dataset
650
+
651
+
652
+ * Size: 1,664 evaluation samples
653
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
654
+ * Approximate statistics based on the first 1000 samples:
655
+ | | sentence1 | sentence2 |
656
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
657
+ | type | string | string |
658
+ | details | <ul><li>min: 4 tokens</li><li>mean: 28.9 tokens</li><li>max: 348 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 57.31 tokens</li><li>max: 450 tokens</li></ul> |
659
+ * Samples:
660
+ | sentence1 | sentence2 |
661
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
662
+ | <code>Gene expression is regulated primarily at the what level?</code> | <code>Gene expression is regulated primarily at the transcriptional level.</code> |
663
+ | <code>Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration.</code> | <code>Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.</code> |
664
+ | <code>In which James Bond film did Sean Connery wear the Bell Rocket Belt (Jet Pack)?</code> | <code>Jet Pack - James Bond Gadgets 125lbs Summary James Bond used the Jetpack in 1965's Thunderball, to escape from gunmen after killing a SPECTRE agent. The Jetpack In the 1965 movie Thunderball, James Bond (Sean Connery) uses Q's Jetpack to escape from two gunmen after killing Jacques Bouvar, SPECTRE Agent No. 6. It was also used in the Thunderball movie posters, being the "Look Up" part of the "Look Up! Look Down! Look Out!" tagline. The Jetpack returned in the 2002 movie Die Another Day, in the Q scene that showcased many other classic gadgets. The Jetpack is a very popular Bond gadget and is a favorite among many fans due to its originality and uniqueness. The Bell Rocket Belt The Jetpack is actually a Bell Rocket Belt, a fully functional rocket pack device. It was designed for use in the army, but was rejected because of its short flying time of 21-22 seconds. Powered by hydrogen peroxide, it could fly about 250m and reach a maximum altitude of 18m, going 55km/h. Despite its impracticality in the real world, the Jetpack made a spectacular debut in Thunderball. Although Sean Connery is seen in the takeoff and landings, the main flight was piloted by Gordon Yeager and Bill Suitor.</code> |
665
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
666
+ ```json
667
+ {'guide': SentenceTransformer(
668
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
669
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
670
+ (2): Normalize()
671
+ ), 'temperature': 0.025}
672
+ ```
673
+
674
+ ### Training Hyperparameters
675
+ #### Non-Default Hyperparameters
676
+
677
+ - `eval_strategy`: steps
678
+ - `per_device_train_batch_size`: 32
679
+ - `per_device_eval_batch_size`: 256
680
+ - `lr_scheduler_type`: cosine_with_min_lr
681
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
682
+ - `warmup_ratio`: 0.33
683
+ - `save_safetensors`: False
684
+ - `fp16`: True
685
+ - `push_to_hub`: True
686
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest2-step1-checkpoints-tmp
687
+ - `hub_strategy`: all_checkpoints
688
+ - `batch_sampler`: no_duplicates
689
+
690
+ #### All Hyperparameters
691
+ <details><summary>Click to expand</summary>
692
+
693
+ - `overwrite_output_dir`: False
694
+ - `do_predict`: False
695
+ - `eval_strategy`: steps
696
+ - `prediction_loss_only`: True
697
+ - `per_device_train_batch_size`: 32
698
+ - `per_device_eval_batch_size`: 256
699
+ - `per_gpu_train_batch_size`: None
700
+ - `per_gpu_eval_batch_size`: None
701
+ - `gradient_accumulation_steps`: 1
702
+ - `eval_accumulation_steps`: None
703
+ - `torch_empty_cache_steps`: None
704
+ - `learning_rate`: 5e-05
705
+ - `weight_decay`: 0.0
706
+ - `adam_beta1`: 0.9
707
+ - `adam_beta2`: 0.999
708
+ - `adam_epsilon`: 1e-08
709
+ - `max_grad_norm`: 1.0
710
+ - `num_train_epochs`: 3
711
+ - `max_steps`: -1
712
+ - `lr_scheduler_type`: cosine_with_min_lr
713
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
714
+ - `warmup_ratio`: 0.33
715
+ - `warmup_steps`: 0
716
+ - `log_level`: passive
717
+ - `log_level_replica`: warning
718
+ - `log_on_each_node`: True
719
+ - `logging_nan_inf_filter`: True
720
+ - `save_safetensors`: False
721
+ - `save_on_each_node`: False
722
+ - `save_only_model`: False
723
+ - `restore_callback_states_from_checkpoint`: False
724
+ - `no_cuda`: False
725
+ - `use_cpu`: False
726
+ - `use_mps_device`: False
727
+ - `seed`: 42
728
+ - `data_seed`: None
729
+ - `jit_mode_eval`: False
730
+ - `use_ipex`: False
731
+ - `bf16`: False
732
+ - `fp16`: True
733
+ - `fp16_opt_level`: O1
734
+ - `half_precision_backend`: auto
735
+ - `bf16_full_eval`: False
736
+ - `fp16_full_eval`: False
737
+ - `tf32`: None
738
+ - `local_rank`: 0
739
+ - `ddp_backend`: None
740
+ - `tpu_num_cores`: None
741
+ - `tpu_metrics_debug`: False
742
+ - `debug`: []
743
+ - `dataloader_drop_last`: False
744
+ - `dataloader_num_workers`: 0
745
+ - `dataloader_prefetch_factor`: None
746
+ - `past_index`: -1
747
+ - `disable_tqdm`: False
748
+ - `remove_unused_columns`: True
749
+ - `label_names`: None
750
+ - `load_best_model_at_end`: False
751
+ - `ignore_data_skip`: False
752
+ - `fsdp`: []
753
+ - `fsdp_min_num_params`: 0
754
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
755
+ - `fsdp_transformer_layer_cls_to_wrap`: None
756
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
757
+ - `deepspeed`: None
758
+ - `label_smoothing_factor`: 0.0
759
+ - `optim`: adamw_torch
760
+ - `optim_args`: None
761
+ - `adafactor`: False
762
+ - `group_by_length`: False
763
+ - `length_column_name`: length
764
+ - `ddp_find_unused_parameters`: None
765
+ - `ddp_bucket_cap_mb`: None
766
+ - `ddp_broadcast_buffers`: False
767
+ - `dataloader_pin_memory`: True
768
+ - `dataloader_persistent_workers`: False
769
+ - `skip_memory_metrics`: True
770
+ - `use_legacy_prediction_loop`: False
771
+ - `push_to_hub`: True
772
+ - `resume_from_checkpoint`: None
773
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest2-step1-checkpoints-tmp
774
+ - `hub_strategy`: all_checkpoints
775
+ - `hub_private_repo`: False
776
+ - `hub_always_push`: False
777
+ - `gradient_checkpointing`: False
778
+ - `gradient_checkpointing_kwargs`: None
779
+ - `include_inputs_for_metrics`: False
780
+ - `eval_do_concat_batches`: True
781
+ - `fp16_backend`: auto
782
+ - `push_to_hub_model_id`: None
783
+ - `push_to_hub_organization`: None
784
+ - `mp_parameters`:
785
+ - `auto_find_batch_size`: False
786
+ - `full_determinism`: False
787
+ - `torchdynamo`: None
788
+ - `ray_scope`: last
789
+ - `ddp_timeout`: 1800
790
+ - `torch_compile`: False
791
+ - `torch_compile_backend`: None
792
+ - `torch_compile_mode`: None
793
+ - `dispatch_batches`: None
794
+ - `split_batches`: None
795
+ - `include_tokens_per_second`: False
796
+ - `include_num_input_tokens_seen`: False
797
+ - `neftune_noise_alpha`: None
798
+ - `optim_target_modules`: None
799
+ - `batch_eval_metrics`: False
800
+ - `eval_on_start`: False
801
+ - `eval_use_gather_object`: False
802
+ - `batch_sampler`: no_duplicates
803
+ - `multi_dataset_batch_sampler`: proportional
804
+
805
+ </details>
806
+
807
+ ### Training Logs
808
+ <details><summary>Click to expand</summary>
809
+
810
+ | Epoch | Step | Training Loss | Validation Loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
811
+ |:------:|:----:|:-------------:|:---------------:|:------------------------:|:-----------------:|:---------------:|
812
+ | 0.0010 | 1 | 18.7427 | - | - | - | - |
813
+ | 0.0020 | 2 | 11.6434 | - | - | - | - |
814
+ | 0.0030 | 3 | 7.4859 | - | - | - | - |
815
+ | 0.0039 | 4 | 7.3779 | - | - | - | - |
816
+ | 0.0049 | 5 | 17.5878 | - | - | - | - |
817
+ | 0.0059 | 6 | 8.4984 | - | - | - | - |
818
+ | 0.0069 | 7 | 8.375 | - | - | - | - |
819
+ | 0.0079 | 8 | 7.3241 | - | - | - | - |
820
+ | 0.0089 | 9 | 10.3081 | - | - | - | - |
821
+ | 0.0098 | 10 | 8.5363 | - | - | - | - |
822
+ | 0.0108 | 11 | 17.2241 | - | - | - | - |
823
+ | 0.0118 | 12 | 7.575 | - | - | - | - |
824
+ | 0.0128 | 13 | 9.1905 | - | - | - | - |
825
+ | 0.0138 | 14 | 11.7727 | - | - | - | - |
826
+ | 0.0148 | 15 | 9.5827 | - | - | - | - |
827
+ | 0.0157 | 16 | 7.4432 | - | - | - | - |
828
+ | 0.0167 | 17 | 7.1573 | - | - | - | - |
829
+ | 0.0177 | 18 | 19.8016 | - | - | - | - |
830
+ | 0.0187 | 19 | 19.5118 | - | - | - | - |
831
+ | 0.0197 | 20 | 7.9062 | - | - | - | - |
832
+ | 0.0207 | 21 | 8.6791 | - | - | - | - |
833
+ | 0.0217 | 22 | 7.7318 | - | - | - | - |
834
+ | 0.0226 | 23 | 7.9319 | - | - | - | - |
835
+ | 0.0236 | 24 | 7.192 | - | - | - | - |
836
+ | 0.0246 | 25 | 15.5799 | - | - | - | - |
837
+ | 0.0256 | 26 | 9.7859 | - | - | - | - |
838
+ | 0.0266 | 27 | 9.9259 | - | - | - | - |
839
+ | 0.0276 | 28 | 6.3076 | - | - | - | - |
840
+ | 0.0285 | 29 | 7.4471 | - | - | - | - |
841
+ | 0.0295 | 30 | 7.1246 | - | - | - | - |
842
+ | 0.0305 | 31 | 6.5505 | - | - | - | - |
843
+ | 0.0315 | 32 | 18.5194 | - | - | - | - |
844
+ | 0.0325 | 33 | 7.0747 | - | - | - | - |
845
+ | 0.0335 | 34 | 14.9456 | - | - | - | - |
846
+ | 0.0344 | 35 | 6.608 | - | - | - | - |
847
+ | 0.0354 | 36 | 8.4672 | - | - | - | - |
848
+ | 0.0364 | 37 | 6.8853 | - | - | - | - |
849
+ | 0.0374 | 38 | 13.6063 | - | - | - | - |
850
+ | 0.0384 | 39 | 7.2625 | - | - | - | - |
851
+ | 0.0394 | 40 | 6.2234 | - | - | - | - |
852
+ | 0.0404 | 41 | 14.9675 | - | - | - | - |
853
+ | 0.0413 | 42 | 6.6038 | - | - | - | - |
854
+ | 0.0423 | 43 | 13.1173 | - | - | - | - |
855
+ | 0.0433 | 44 | 16.6992 | - | - | - | - |
856
+ | 0.0443 | 45 | 6.4828 | - | - | - | - |
857
+ | 0.0453 | 46 | 5.9815 | - | - | - | - |
858
+ | 0.0463 | 47 | 6.1738 | - | - | - | - |
859
+ | 0.0472 | 48 | 7.134 | - | - | - | - |
860
+ | 0.0482 | 49 | 9.3933 | - | - | - | - |
861
+ | 0.0492 | 50 | 10.8085 | - | - | - | - |
862
+ | 0.0502 | 51 | 11.4172 | - | - | - | - |
863
+ | 0.0512 | 52 | 7.3397 | - | - | - | - |
864
+ | 0.0522 | 53 | 5.8851 | - | - | - | - |
865
+ | 0.0531 | 54 | 6.8105 | - | - | - | - |
866
+ | 0.0541 | 55 | 5.3637 | - | - | - | - |
867
+ | 0.0551 | 56 | 6.2628 | - | - | - | - |
868
+ | 0.0561 | 57 | 6.0039 | - | - | - | - |
869
+ | 0.0571 | 58 | 7.5859 | - | - | - | - |
870
+ | 0.0581 | 59 | 6.0802 | - | - | - | - |
871
+ | 0.0591 | 60 | 5.5822 | - | - | - | - |
872
+ | 0.0600 | 61 | 5.8773 | - | - | - | - |
873
+ | 0.0610 | 62 | 6.0814 | - | - | - | - |
874
+ | 0.0620 | 63 | 5.4483 | - | - | - | - |
875
+ | 0.0630 | 64 | 10.2506 | - | - | - | - |
876
+ | 0.0640 | 65 | 10.5976 | - | - | - | - |
877
+ | 0.0650 | 66 | 6.9942 | - | - | - | - |
878
+ | 0.0659 | 67 | 5.4813 | - | - | - | - |
879
+ | 0.0669 | 68 | 7.045 | - | - | - | - |
880
+ | 0.0679 | 69 | 5.8549 | - | - | - | - |
881
+ | 0.0689 | 70 | 8.8514 | - | - | - | - |
882
+ | 0.0699 | 71 | 5.2557 | - | - | - | - |
883
+ | 0.0709 | 72 | 5.1181 | - | - | - | - |
884
+ | 0.0719 | 73 | 5.5331 | - | - | - | - |
885
+ | 0.0728 | 74 | 5.5944 | - | - | - | - |
886
+ | 0.0738 | 75 | 4.6332 | - | - | - | - |
887
+ | 0.0748 | 76 | 4.9532 | - | - | - | - |
888
+ | 0.0758 | 77 | 5.055 | - | - | - | - |
889
+ | 0.0768 | 78 | 4.5005 | - | - | - | - |
890
+ | 0.0778 | 79 | 5.1997 | - | - | - | - |
891
+ | 0.0787 | 80 | 5.1479 | - | - | - | - |
892
+ | 0.0797 | 81 | 5.1777 | - | - | - | - |
893
+ | 0.0807 | 82 | 5.5565 | - | - | - | - |
894
+ | 0.0817 | 83 | 4.6999 | - | - | - | - |
895
+ | 0.0827 | 84 | 5.0681 | - | - | - | - |
896
+ | 0.0837 | 85 | 5.2208 | - | - | - | - |
897
+ | 0.0846 | 86 | 4.56 | - | - | - | - |
898
+ | 0.0856 | 87 | 4.6793 | - | - | - | - |
899
+ | 0.0866 | 88 | 4.4611 | - | - | - | - |
900
+ | 0.0876 | 89 | 9.623 | - | - | - | - |
901
+ | 0.0886 | 90 | 5.0316 | - | - | - | - |
902
+ | 0.0896 | 91 | 4.1771 | - | - | - | - |
903
+ | 0.0906 | 92 | 4.9652 | - | - | - | - |
904
+ | 0.0915 | 93 | 8.7432 | - | - | - | - |
905
+ | 0.0925 | 94 | 4.6234 | - | - | - | - |
906
+ | 0.0935 | 95 | 4.4016 | - | - | - | - |
907
+ | 0.0945 | 96 | 4.9903 | - | - | - | - |
908
+ | 0.0955 | 97 | 4.5606 | - | - | - | - |
909
+ | 0.0965 | 98 | 4.9534 | - | - | - | - |
910
+ | 0.0974 | 99 | 8.1838 | - | - | - | - |
911
+ | 0.0984 | 100 | 4.9736 | - | - | - | - |
912
+ | 0.0994 | 101 | 4.4733 | - | - | - | - |
913
+ | 0.1004 | 102 | 4.9725 | - | - | - | - |
914
+ | 0.1014 | 103 | 4.5861 | - | - | - | - |
915
+ | 0.1024 | 104 | 7.7634 | - | - | - | - |
916
+ | 0.1033 | 105 | 4.9915 | - | - | - | - |
917
+ | 0.1043 | 106 | 5.1391 | - | - | - | - |
918
+ | 0.1053 | 107 | 5.0157 | - | - | - | - |
919
+ | 0.1063 | 108 | 4.0982 | - | - | - | - |
920
+ | 0.1073 | 109 | 4.2178 | - | - | - | - |
921
+ | 0.1083 | 110 | 4.6193 | - | - | - | - |
922
+ | 0.1093 | 111 | 4.7638 | - | - | - | - |
923
+ | 0.1102 | 112 | 4.1207 | - | - | - | - |
924
+ | 0.1112 | 113 | 5.2034 | - | - | - | - |
925
+ | 0.1122 | 114 | 5.0693 | - | - | - | - |
926
+ | 0.1132 | 115 | 4.7895 | - | - | - | - |
927
+ | 0.1142 | 116 | 4.9486 | - | - | - | - |
928
+ | 0.1152 | 117 | 4.6552 | - | - | - | - |
929
+ | 0.1161 | 118 | 4.4555 | - | - | - | - |
930
+ | 0.1171 | 119 | 4.8977 | - | - | - | - |
931
+ | 0.1181 | 120 | 7.6836 | - | - | - | - |
932
+ | 0.1191 | 121 | 4.8106 | - | - | - | - |
933
+ | 0.1201 | 122 | 4.9958 | - | - | - | - |
934
+ | 0.1211 | 123 | 4.4585 | - | - | - | - |
935
+ | 0.1220 | 124 | 7.5559 | - | - | - | - |
936
+ | 0.1230 | 125 | 4.2636 | - | - | - | - |
937
+ | 0.1240 | 126 | 4.0436 | - | - | - | - |
938
+ | 0.125 | 127 | 4.7416 | - | - | - | - |
939
+ | 0.1260 | 128 | 4.2215 | - | - | - | - |
940
+ | 0.1270 | 129 | 6.3561 | - | - | - | - |
941
+ | 0.1280 | 130 | 6.2299 | - | - | - | - |
942
+ | 0.1289 | 131 | 4.3492 | - | - | - | - |
943
+ | 0.1299 | 132 | 4.0216 | - | - | - | - |
944
+ | 0.1309 | 133 | 6.963 | - | - | - | - |
945
+ | 0.1319 | 134 | 3.9474 | - | - | - | - |
946
+ | 0.1329 | 135 | 4.3437 | - | - | - | - |
947
+ | 0.1339 | 136 | 3.6267 | - | - | - | - |
948
+ | 0.1348 | 137 | 3.9896 | - | - | - | - |
949
+ | 0.1358 | 138 | 4.8156 | - | - | - | - |
950
+ | 0.1368 | 139 | 4.9751 | - | - | - | - |
951
+ | 0.1378 | 140 | 4.4144 | - | - | - | - |
952
+ | 0.1388 | 141 | 4.7213 | - | - | - | - |
953
+ | 0.1398 | 142 | 6.6081 | - | - | - | - |
954
+ | 0.1407 | 143 | 4.2929 | - | - | - | - |
955
+ | 0.1417 | 144 | 4.2537 | - | - | - | - |
956
+ | 0.1427 | 145 | 4.0647 | - | - | - | - |
957
+ | 0.1437 | 146 | 3.937 | - | - | - | - |
958
+ | 0.1447 | 147 | 5.6582 | - | - | - | - |
959
+ | 0.1457 | 148 | 4.2648 | - | - | - | - |
960
+ | 0.1467 | 149 | 4.4429 | - | - | - | - |
961
+ | 0.1476 | 150 | 3.6197 | - | - | - | - |
962
+ | 0.1486 | 151 | 3.7953 | - | - | - | - |
963
+ | 0.1496 | 152 | 3.8175 | - | - | - | - |
964
+ | 0.1506 | 153 | 4.5137 | 3.3210 | 0.1806 | 0.3919 | 0.5750 |
965
+ | 0.1516 | 154 | 4.3528 | - | - | - | - |
966
+ | 0.1526 | 155 | 3.6573 | - | - | - | - |
967
+ | 0.1535 | 156 | 3.5248 | - | - | - | - |
968
+ | 0.1545 | 157 | 3.9275 | - | - | - | - |
969
+ | 0.1555 | 158 | 7.1868 | - | - | - | - |
970
+ | 0.1565 | 159 | 3.6294 | - | - | - | - |
971
+ | 0.1575 | 160 | 3.6886 | - | - | - | - |
972
+ | 0.1585 | 161 | 3.1873 | - | - | - | - |
973
+ | 0.1594 | 162 | 6.1951 | - | - | - | - |
974
+ | 0.1604 | 163 | 3.9747 | - | - | - | - |
975
+ | 0.1614 | 164 | 7.004 | - | - | - | - |
976
+ | 0.1624 | 165 | 4.3221 | - | - | - | - |
977
+ | 0.1634 | 166 | 3.5963 | - | - | - | - |
978
+ | 0.1644 | 167 | 3.1988 | - | - | - | - |
979
+ | 0.1654 | 168 | 3.8236 | - | - | - | - |
980
+ | 0.1663 | 169 | 3.5063 | - | - | - | - |
981
+ | 0.1673 | 170 | 5.9843 | - | - | - | - |
982
+ | 0.1683 | 171 | 5.884 | - | - | - | - |
983
+ | 0.1693 | 172 | 4.1317 | - | - | - | - |
984
+ | 0.1703 | 173 | 3.9255 | - | - | - | - |
985
+ | 0.1713 | 174 | 4.1121 | - | - | - | - |
986
+ | 0.1722 | 175 | 3.7748 | - | - | - | - |
987
+ | 0.1732 | 176 | 5.1602 | - | - | - | - |
988
+ | 0.1742 | 177 | 4.8807 | - | - | - | - |
989
+ | 0.1752 | 178 | 3.4643 | - | - | - | - |
990
+ | 0.1762 | 179 | 3.4937 | - | - | - | - |
991
+ | 0.1772 | 180 | 5.2731 | - | - | - | - |
992
+ | 0.1781 | 181 | 4.6416 | - | - | - | - |
993
+ | 0.1791 | 182 | 3.5226 | - | - | - | - |
994
+ | 0.1801 | 183 | 4.7794 | - | - | - | - |
995
+ | 0.1811 | 184 | 3.8504 | - | - | - | - |
996
+ | 0.1821 | 185 | 3.5391 | - | - | - | - |
997
+ | 0.1831 | 186 | 4.0291 | - | - | - | - |
998
+ | 0.1841 | 187 | 3.5606 | - | - | - | - |
999
+ | 0.1850 | 188 | 3.8957 | - | - | - | - |
1000
+ | 0.1860 | 189 | 4.3657 | - | - | - | - |
1001
+ | 0.1870 | 190 | 5.0173 | - | - | - | - |
1002
+ | 0.1880 | 191 | 4.3915 | - | - | - | - |
1003
+ | 0.1890 | 192 | 3.4613 | - | - | - | - |
1004
+ | 0.1900 | 193 | 3.2005 | - | - | - | - |
1005
+ | 0.1909 | 194 | 3.3986 | - | - | - | - |
1006
+ | 0.1919 | 195 | 3.7937 | - | - | - | - |
1007
+ | 0.1929 | 196 | 3.8981 | - | - | - | - |
1008
+ | 0.1939 | 197 | 3.7051 | - | - | - | - |
1009
+ | 0.1949 | 198 | 3.8028 | - | - | - | - |
1010
+ | 0.1959 | 199 | 3.3294 | - | - | - | - |
1011
+ | 0.1969 | 200 | 4.1252 | - | - | - | - |
1012
+ | 0.1978 | 201 | 4.2564 | - | - | - | - |
1013
+ | 0.1988 | 202 | 3.8258 | - | - | - | - |
1014
+ | 0.1998 | 203 | 3.1025 | - | - | - | - |
1015
+ | 0.2008 | 204 | 3.5038 | - | - | - | - |
1016
+ | 0.2018 | 205 | 3.6021 | - | - | - | - |
1017
+ | 0.2028 | 206 | 3.7637 | - | - | - | - |
1018
+ | 0.2037 | 207 | 3.2563 | - | - | - | - |
1019
+ | 0.2047 | 208 | 3.9323 | - | - | - | - |
1020
+ | 0.2057 | 209 | 3.489 | - | - | - | - |
1021
+ | 0.2067 | 210 | 3.6549 | - | - | - | - |
1022
+ | 0.2077 | 211 | 3.1609 | - | - | - | - |
1023
+ | 0.2087 | 212 | 3.2467 | - | - | - | - |
1024
+ | 0.2096 | 213 | 3.4514 | - | - | - | - |
1025
+ | 0.2106 | 214 | 3.4945 | - | - | - | - |
1026
+ | 0.2116 | 215 | 3.5932 | - | - | - | - |
1027
+ | 0.2126 | 216 | 3.2289 | - | - | - | - |
1028
+ | 0.2136 | 217 | 3.3279 | - | - | - | - |
1029
+ | 0.2146 | 218 | 3.8141 | - | - | - | - |
1030
+ | 0.2156 | 219 | 3.1171 | - | - | - | - |
1031
+ | 0.2165 | 220 | 3.6287 | - | - | - | - |
1032
+ | 0.2175 | 221 | 3.8517 | - | - | - | - |
1033
+ | 0.2185 | 222 | 3.3836 | - | - | - | - |
1034
+ | 0.2195 | 223 | 3.425 | - | - | - | - |
1035
+ | 0.2205 | 224 | 3.6246 | - | - | - | - |
1036
+ | 0.2215 | 225 | 3.5682 | - | - | - | - |
1037
+ | 0.2224 | 226 | 3.3034 | - | - | - | - |
1038
+ | 0.2234 | 227 | 3.9251 | - | - | - | - |
1039
+ | 0.2244 | 228 | 3.146 | - | - | - | - |
1040
+ | 0.2254 | 229 | 3.8859 | - | - | - | - |
1041
+ | 0.2264 | 230 | 3.2977 | - | - | - | - |
1042
+ | 0.2274 | 231 | 3.2664 | - | - | - | - |
1043
+ | 0.2283 | 232 | 3.1275 | - | - | - | - |
1044
+ | 0.2293 | 233 | 3.2408 | - | - | - | - |
1045
+ | 0.2303 | 234 | 2.907 | - | - | - | - |
1046
+ | 0.2313 | 235 | 2.9178 | - | - | - | - |
1047
+ | 0.2323 | 236 | 3.324 | - | - | - | - |
1048
+ | 0.2333 | 237 | 2.9172 | - | - | - | - |
1049
+ | 0.2343 | 238 | 3.4324 | - | - | - | - |
1050
+ | 0.2352 | 239 | 4.0563 | - | - | - | - |
1051
+ | 0.2362 | 240 | 2.8736 | - | - | - | - |
1052
+ | 0.2372 | 241 | 4.7174 | - | - | - | - |
1053
+ | 0.2382 | 242 | 3.2025 | - | - | - | - |
1054
+ | 0.2392 | 243 | 2.7835 | - | - | - | - |
1055
+ | 0.2402 | 244 | 4.3158 | - | - | - | - |
1056
+ | 0.2411 | 245 | 2.8619 | - | - | - | - |
1057
+ | 0.2421 | 246 | 2.5156 | - | - | - | - |
1058
+ | 0.2431 | 247 | 3.2144 | - | - | - | - |
1059
+ | 0.2441 | 248 | 3.5927 | - | - | - | - |
1060
+ | 0.2451 | 249 | 2.6059 | - | - | - | - |
1061
+ | 0.2461 | 250 | 2.9758 | - | - | - | - |
1062
+ | 0.2470 | 251 | 3.9214 | - | - | - | - |
1063
+ | 0.2480 | 252 | 3.2892 | - | - | - | - |
1064
+ | 0.2490 | 253 | 2.9503 | - | - | - | - |
1065
+ | 0.25 | 254 | 2.5969 | - | - | - | - |
1066
+ | 0.2510 | 255 | 2.9908 | - | - | - | - |
1067
+ | 0.2520 | 256 | 2.8995 | - | - | - | - |
1068
+ | 0.2530 | 257 | 3.124 | - | - | - | - |
1069
+ | 0.2539 | 258 | 3.1197 | - | - | - | - |
1070
+ | 0.2549 | 259 | 2.3073 | - | - | - | - |
1071
+ | 0.2559 | 260 | 2.8441 | - | - | - | - |
1072
+ | 0.2569 | 261 | 1.9788 | - | - | - | - |
1073
+ | 0.2579 | 262 | 2.1442 | - | - | - | - |
1074
+ | 0.2589 | 263 | 4.9015 | - | - | - | - |
1075
+ | 0.2598 | 264 | 2.7866 | - | - | - | - |
1076
+ | 0.2608 | 265 | 2.4588 | - | - | - | - |
1077
+ | 0.2618 | 266 | 2.3909 | - | - | - | - |
1078
+ | 0.2628 | 267 | 4.7394 | - | - | - | - |
1079
+ | 0.2638 | 268 | 3.1581 | - | - | - | - |
1080
+ | 0.2648 | 269 | 3.973 | - | - | - | - |
1081
+ | 0.2657 | 270 | 4.1565 | - | - | - | - |
1082
+ | 0.2667 | 271 | 2.5183 | - | - | - | - |
1083
+ | 0.2677 | 272 | 3.614 | - | - | - | - |
1084
+ | 0.2687 | 273 | 2.6858 | - | - | - | - |
1085
+ | 0.2697 | 274 | 3.1182 | - | - | - | - |
1086
+ | 0.2707 | 275 | 2.9628 | - | - | - | - |
1087
+ | 0.2717 | 276 | 2.8376 | - | - | - | - |
1088
+ | 0.2726 | 277 | 2.7858 | - | - | - | - |
1089
+ | 0.2736 | 278 | 2.1037 | - | - | - | - |
1090
+ | 0.2746 | 279 | 3.0436 | - | - | - | - |
1091
+ | 0.2756 | 280 | 3.4125 | - | - | - | - |
1092
+ | 0.2766 | 281 | 2.5027 | - | - | - | - |
1093
+ | 0.2776 | 282 | 2.7922 | - | - | - | - |
1094
+ | 0.2785 | 283 | 2.9762 | - | - | - | - |
1095
+ | 0.2795 | 284 | 2.6458 | - | - | - | - |
1096
+ | 0.2805 | 285 | 2.962 | - | - | - | - |
1097
+ | 0.2815 | 286 | 2.5439 | - | - | - | - |
1098
+ | 0.2825 | 287 | 2.8437 | - | - | - | - |
1099
+ | 0.2835 | 288 | 3.2134 | - | - | - | - |
1100
+ | 0.2844 | 289 | 2.5655 | - | - | - | - |
1101
+ | 0.2854 | 290 | 2.9465 | - | - | - | - |
1102
+ | 0.2864 | 291 | 2.4653 | - | - | - | - |
1103
+ | 0.2874 | 292 | 3.1467 | - | - | - | - |
1104
+ | 0.2884 | 293 | 2.6551 | - | - | - | - |
1105
+ | 0.2894 | 294 | 2.5098 | - | - | - | - |
1106
+ | 0.2904 | 295 | 2.5988 | - | - | - | - |
1107
+ | 0.2913 | 296 | 3.778 | - | - | - | - |
1108
+ | 0.2923 | 297 | 2.6257 | - | - | - | - |
1109
+ | 0.2933 | 298 | 2.5142 | - | - | - | - |
1110
+ | 0.2943 | 299 | 2.3182 | - | - | - | - |
1111
+ | 0.2953 | 300 | 3.3505 | - | - | - | - |
1112
+ | 0.2963 | 301 | 2.9615 | - | - | - | - |
1113
+ | 0.2972 | 302 | 2.9136 | - | - | - | - |
1114
+ | 0.2982 | 303 | 2.6192 | - | - | - | - |
1115
+ | 0.2992 | 304 | 2.3255 | - | - | - | - |
1116
+ | 0.3002 | 305 | 2.7168 | - | - | - | - |
1117
+
1118
+ </details>
1119
+
1120
+ ### Framework Versions
1121
+ - Python: 3.10.12
1122
+ - Sentence Transformers: 3.2.1
1123
+ - Transformers: 4.44.2
1124
+ - PyTorch: 2.5.0+cu121
1125
+ - Accelerate: 0.34.2
1126
+ - Datasets: 3.0.2
1127
+ - Tokenizers: 0.19.1
1128
+
1129
+ ## Citation
1130
+
1131
+ ### BibTeX
1132
+
1133
+ #### Sentence Transformers
1134
+ ```bibtex
1135
+ @inproceedings{reimers-2019-sentence-bert,
1136
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1137
+ author = "Reimers, Nils and Gurevych, Iryna",
1138
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1139
+ month = "11",
1140
+ year = "2019",
1141
+ publisher = "Association for Computational Linguistics",
1142
+ url = "https://arxiv.org/abs/1908.10084",
1143
+ }
1144
+ ```
1145
+
1146
+ #### GISTEmbedLoss
1147
+ ```bibtex
1148
+ @misc{solatorio2024gistembed,
1149
+ title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
1150
+ author={Aivin V. Solatorio},
1151
+ year={2024},
1152
+ eprint={2402.16829},
1153
+ archivePrefix={arXiv},
1154
+ primaryClass={cs.LG}
1155
+ }
1156
+ ```
1157
+
1158
+ <!--
1159
+ ## Glossary
1160
+
1161
+ *Clearly define terms in order to be accessible across audiences.*
1162
+ -->
1163
+
1164
+ <!--
1165
+ ## Model Card Authors
1166
+
1167
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1168
+ -->
1169
+
1170
+ <!--
1171
+ ## Model Card Contact
1172
+
1173
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1174
+ -->
checkpoint-305/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-305/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.2",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-305/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.5.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-305/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_AdvancedWeightedPooling",
12
+ "type": "__main__.AdvancedWeightedPooling"
13
+ }
14
+ ]
checkpoint-305/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb6006dd2cdf74a81c4d2f6e037ceab522daf7b6451361b3437ed4e74d5249b5
3
+ size 151305210
checkpoint-305/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1dd909b5a024a765fe1f01508aafaf02546a61be1cb0d191b905bdc685f4c567
3
+ size 565251810
checkpoint-305/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36eddb3c27f036ef39a93dd3f73f5a5d231771beb486e7a9050acda2ec3c7ae8
3
+ size 14180
checkpoint-305/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1918ed3953a86f69d28ad390ead9cd56d483d36b66eb8b2af766465b180544c0
3
+ size 1256
checkpoint-305/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-305/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
checkpoint-305/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-305/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-305/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }
checkpoint-305/trainer_state.json ADDED
@@ -0,0 +1,2257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.3001968503937008,
5
+ "eval_steps": 153,
6
+ "global_step": 305,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.000984251968503937,
13
+ "grad_norm": NaN,
14
+ "learning_rate": 0.0,
15
+ "loss": 18.7427,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.001968503937007874,
20
+ "grad_norm": NaN,
21
+ "learning_rate": 0.0,
22
+ "loss": 11.6434,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.002952755905511811,
27
+ "grad_norm": Infinity,
28
+ "learning_rate": 0.0,
29
+ "loss": 7.4859,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.003937007874015748,
34
+ "grad_norm": 31.9741268157959,
35
+ "learning_rate": 9.940357852882705e-10,
36
+ "loss": 7.3779,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.004921259842519685,
41
+ "grad_norm": Infinity,
42
+ "learning_rate": 9.940357852882705e-10,
43
+ "loss": 17.5878,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.005905511811023622,
48
+ "grad_norm": 40.571693420410156,
49
+ "learning_rate": 1.988071570576541e-09,
50
+ "loss": 8.4984,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.006889763779527559,
55
+ "grad_norm": 44.114723205566406,
56
+ "learning_rate": 2.9821073558648116e-09,
57
+ "loss": 8.375,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.007874015748031496,
62
+ "grad_norm": 39.95664978027344,
63
+ "learning_rate": 3.976143141153082e-09,
64
+ "loss": 7.3241,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.008858267716535433,
69
+ "grad_norm": 54.75717544555664,
70
+ "learning_rate": 4.970178926441353e-09,
71
+ "loss": 10.3081,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.00984251968503937,
76
+ "grad_norm": 48.129844665527344,
77
+ "learning_rate": 5.964214711729623e-09,
78
+ "loss": 8.5363,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.010826771653543307,
83
+ "grad_norm": Infinity,
84
+ "learning_rate": 5.964214711729623e-09,
85
+ "loss": 17.2241,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.011811023622047244,
90
+ "grad_norm": 37.79861068725586,
91
+ "learning_rate": 6.9582504970178946e-09,
92
+ "loss": 7.575,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.012795275590551181,
97
+ "grad_norm": 37.935638427734375,
98
+ "learning_rate": 7.952286282306164e-09,
99
+ "loss": 9.1905,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.013779527559055118,
104
+ "grad_norm": 59.75738525390625,
105
+ "learning_rate": 8.946322067594435e-09,
106
+ "loss": 11.7727,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.014763779527559055,
111
+ "grad_norm": 46.43831253051758,
112
+ "learning_rate": 9.940357852882705e-09,
113
+ "loss": 9.5827,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.015748031496062992,
118
+ "grad_norm": 36.75105667114258,
119
+ "learning_rate": 1.0934393638170978e-08,
120
+ "loss": 7.4432,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.01673228346456693,
125
+ "grad_norm": 30.298437118530273,
126
+ "learning_rate": 1.1928429423459246e-08,
127
+ "loss": 7.1573,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.017716535433070866,
132
+ "grad_norm": 96.71277618408203,
133
+ "learning_rate": 1.2922465208747517e-08,
134
+ "loss": 19.8016,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.018700787401574805,
139
+ "grad_norm": 84.41168212890625,
140
+ "learning_rate": 1.3916500994035789e-08,
141
+ "loss": 19.5118,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.01968503937007874,
146
+ "grad_norm": 35.94807815551758,
147
+ "learning_rate": 1.4910536779324056e-08,
148
+ "loss": 7.9062,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.02066929133858268,
153
+ "grad_norm": 41.35914611816406,
154
+ "learning_rate": 1.590457256461233e-08,
155
+ "loss": 8.6791,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.021653543307086614,
160
+ "grad_norm": 37.65285873413086,
161
+ "learning_rate": 1.68986083499006e-08,
162
+ "loss": 7.7318,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.022637795275590553,
167
+ "grad_norm": 40.24616622924805,
168
+ "learning_rate": 1.789264413518887e-08,
169
+ "loss": 7.9319,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.023622047244094488,
174
+ "grad_norm": 32.05257034301758,
175
+ "learning_rate": 1.888667992047714e-08,
176
+ "loss": 7.192,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.024606299212598427,
181
+ "grad_norm": 87.60692596435547,
182
+ "learning_rate": 1.988071570576541e-08,
183
+ "loss": 15.5799,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.025590551181102362,
188
+ "grad_norm": 46.73486328125,
189
+ "learning_rate": 2.087475149105368e-08,
190
+ "loss": 9.7859,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.0265748031496063,
195
+ "grad_norm": 50.18130111694336,
196
+ "learning_rate": 2.1868787276341955e-08,
197
+ "loss": 9.9259,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.027559055118110236,
202
+ "grad_norm": 29.22075653076172,
203
+ "learning_rate": 2.2862823061630224e-08,
204
+ "loss": 6.3076,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.028543307086614175,
209
+ "grad_norm": 37.356693267822266,
210
+ "learning_rate": 2.3856858846918493e-08,
211
+ "loss": 7.4471,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.02952755905511811,
216
+ "grad_norm": 35.80938720703125,
217
+ "learning_rate": 2.4850894632206765e-08,
218
+ "loss": 7.1246,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.03051181102362205,
223
+ "grad_norm": 25.738622665405273,
224
+ "learning_rate": 2.5844930417495034e-08,
225
+ "loss": 6.5505,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.031496062992125984,
230
+ "grad_norm": 83.83589935302734,
231
+ "learning_rate": 2.6838966202783303e-08,
232
+ "loss": 18.5194,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.03248031496062992,
237
+ "grad_norm": 33.8433837890625,
238
+ "learning_rate": 2.7833001988071578e-08,
239
+ "loss": 7.0747,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.03346456692913386,
244
+ "grad_norm": 74.36174011230469,
245
+ "learning_rate": 2.8827037773359847e-08,
246
+ "loss": 14.9456,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.0344488188976378,
251
+ "grad_norm": 26.985536575317383,
252
+ "learning_rate": 2.982107355864811e-08,
253
+ "loss": 6.608,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.03543307086614173,
258
+ "grad_norm": 40.41023635864258,
259
+ "learning_rate": 3.081510934393639e-08,
260
+ "loss": 8.4672,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.03641732283464567,
265
+ "grad_norm": 33.05155944824219,
266
+ "learning_rate": 3.180914512922466e-08,
267
+ "loss": 6.8853,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.03740157480314961,
272
+ "grad_norm": 74.3410873413086,
273
+ "learning_rate": 3.280318091451293e-08,
274
+ "loss": 13.6063,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.038385826771653545,
279
+ "grad_norm": 35.997493743896484,
280
+ "learning_rate": 3.37972166998012e-08,
281
+ "loss": 7.2625,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.03937007874015748,
286
+ "grad_norm": 31.67265510559082,
287
+ "learning_rate": 3.479125248508947e-08,
288
+ "loss": 6.2234,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.040354330708661415,
293
+ "grad_norm": 78.0589599609375,
294
+ "learning_rate": 3.578528827037774e-08,
295
+ "loss": 14.9675,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 0.04133858267716536,
300
+ "grad_norm": 27.73711395263672,
301
+ "learning_rate": 3.6779324055666005e-08,
302
+ "loss": 6.6038,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 0.04232283464566929,
307
+ "grad_norm": 61.068092346191406,
308
+ "learning_rate": 3.777335984095428e-08,
309
+ "loss": 13.1173,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 0.04330708661417323,
314
+ "grad_norm": 84.73564910888672,
315
+ "learning_rate": 3.8767395626242556e-08,
316
+ "loss": 16.6992,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 0.04429133858267716,
321
+ "grad_norm": 27.289846420288086,
322
+ "learning_rate": 3.976143141153082e-08,
323
+ "loss": 6.4828,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 0.045275590551181105,
328
+ "grad_norm": 25.921550750732422,
329
+ "learning_rate": 4.0755467196819094e-08,
330
+ "loss": 5.9815,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 0.04625984251968504,
335
+ "grad_norm": 28.005834579467773,
336
+ "learning_rate": 4.174950298210736e-08,
337
+ "loss": 6.1738,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 0.047244094488188976,
342
+ "grad_norm": 33.678253173828125,
343
+ "learning_rate": 4.274353876739563e-08,
344
+ "loss": 7.134,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 0.04822834645669291,
349
+ "grad_norm": 50.81912612915039,
350
+ "learning_rate": 4.373757455268391e-08,
351
+ "loss": 9.3933,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 0.04921259842519685,
356
+ "grad_norm": 53.16002655029297,
357
+ "learning_rate": 4.4731610337972176e-08,
358
+ "loss": 10.8085,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 0.05019685039370079,
363
+ "grad_norm": 53.343692779541016,
364
+ "learning_rate": 4.572564612326045e-08,
365
+ "loss": 11.4172,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 0.051181102362204724,
370
+ "grad_norm": 29.51181983947754,
371
+ "learning_rate": 4.6719681908548713e-08,
372
+ "loss": 7.3397,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 0.05216535433070866,
377
+ "grad_norm": 28.54021644592285,
378
+ "learning_rate": 4.7713717693836986e-08,
379
+ "loss": 5.8851,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 0.0531496062992126,
384
+ "grad_norm": 29.310325622558594,
385
+ "learning_rate": 4.870775347912525e-08,
386
+ "loss": 6.8105,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 0.054133858267716536,
391
+ "grad_norm": 19.80927276611328,
392
+ "learning_rate": 4.970178926441353e-08,
393
+ "loss": 5.3637,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 0.05511811023622047,
398
+ "grad_norm": 30.965606689453125,
399
+ "learning_rate": 5.06958250497018e-08,
400
+ "loss": 6.2628,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 0.05610236220472441,
405
+ "grad_norm": 21.39442253112793,
406
+ "learning_rate": 5.168986083499007e-08,
407
+ "loss": 6.0039,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 0.05708661417322835,
412
+ "grad_norm": 38.37812805175781,
413
+ "learning_rate": 5.268389662027834e-08,
414
+ "loss": 7.5859,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 0.058070866141732284,
419
+ "grad_norm": 22.37571144104004,
420
+ "learning_rate": 5.3677932405566605e-08,
421
+ "loss": 6.0802,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 0.05905511811023622,
426
+ "grad_norm": 27.690608978271484,
427
+ "learning_rate": 5.467196819085488e-08,
428
+ "loss": 5.5822,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 0.060039370078740155,
433
+ "grad_norm": 23.312894821166992,
434
+ "learning_rate": 5.5666003976143156e-08,
435
+ "loss": 5.8773,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 0.0610236220472441,
440
+ "grad_norm": 19.575923919677734,
441
+ "learning_rate": 5.666003976143142e-08,
442
+ "loss": 6.0814,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 0.06200787401574803,
447
+ "grad_norm": 20.126571655273438,
448
+ "learning_rate": 5.7654075546719694e-08,
449
+ "loss": 5.4483,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 0.06299212598425197,
454
+ "grad_norm": 57.36194610595703,
455
+ "learning_rate": 5.864811133200796e-08,
456
+ "loss": 10.2506,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 0.0639763779527559,
461
+ "grad_norm": 48.520416259765625,
462
+ "learning_rate": 5.964214711729623e-08,
463
+ "loss": 10.5976,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 0.06496062992125984,
468
+ "grad_norm": 24.259727478027344,
469
+ "learning_rate": 6.06361829025845e-08,
470
+ "loss": 6.9942,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 0.06594488188976377,
475
+ "grad_norm": 18.457971572875977,
476
+ "learning_rate": 6.163021868787278e-08,
477
+ "loss": 5.4813,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 0.06692913385826772,
482
+ "grad_norm": 27.634117126464844,
483
+ "learning_rate": 6.262425447316104e-08,
484
+ "loss": 7.045,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 0.06791338582677166,
489
+ "grad_norm": 18.592876434326172,
490
+ "learning_rate": 6.361829025844931e-08,
491
+ "loss": 5.8549,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 0.0688976377952756,
496
+ "grad_norm": 37.92665481567383,
497
+ "learning_rate": 6.461232604373759e-08,
498
+ "loss": 8.8514,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 0.06988188976377953,
503
+ "grad_norm": 19.356266021728516,
504
+ "learning_rate": 6.560636182902586e-08,
505
+ "loss": 5.2557,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 0.07086614173228346,
510
+ "grad_norm": 13.698501586914062,
511
+ "learning_rate": 6.660039761431412e-08,
512
+ "loss": 5.1181,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 0.0718503937007874,
517
+ "grad_norm": 19.626392364501953,
518
+ "learning_rate": 6.75944333996024e-08,
519
+ "loss": 5.5331,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 0.07283464566929133,
524
+ "grad_norm": 15.392860412597656,
525
+ "learning_rate": 6.858846918489067e-08,
526
+ "loss": 5.5944,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 0.07381889763779527,
531
+ "grad_norm": 16.435239791870117,
532
+ "learning_rate": 6.958250497017893e-08,
533
+ "loss": 4.6332,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 0.07480314960629922,
538
+ "grad_norm": 16.286462783813477,
539
+ "learning_rate": 7.057654075546721e-08,
540
+ "loss": 4.9532,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 0.07578740157480315,
545
+ "grad_norm": 17.462007522583008,
546
+ "learning_rate": 7.157057654075548e-08,
547
+ "loss": 5.055,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 0.07677165354330709,
552
+ "grad_norm": 15.096108436584473,
553
+ "learning_rate": 7.256461232604374e-08,
554
+ "loss": 4.5005,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 0.07775590551181102,
559
+ "grad_norm": 14.233656883239746,
560
+ "learning_rate": 7.355864811133201e-08,
561
+ "loss": 5.1997,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 0.07874015748031496,
566
+ "grad_norm": 19.416706085205078,
567
+ "learning_rate": 7.455268389662029e-08,
568
+ "loss": 5.1479,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 0.0797244094488189,
573
+ "grad_norm": 18.06542205810547,
574
+ "learning_rate": 7.554671968190855e-08,
575
+ "loss": 5.1777,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 0.08070866141732283,
580
+ "grad_norm": 16.027755737304688,
581
+ "learning_rate": 7.654075546719683e-08,
582
+ "loss": 5.5565,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 0.08169291338582677,
587
+ "grad_norm": 16.291139602661133,
588
+ "learning_rate": 7.753479125248511e-08,
589
+ "loss": 4.6999,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 0.08267716535433071,
594
+ "grad_norm": 16.005495071411133,
595
+ "learning_rate": 7.852882703777338e-08,
596
+ "loss": 5.0681,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 0.08366141732283465,
601
+ "grad_norm": 19.505733489990234,
602
+ "learning_rate": 7.952286282306164e-08,
603
+ "loss": 5.2208,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 0.08464566929133858,
608
+ "grad_norm": 18.256221771240234,
609
+ "learning_rate": 8.051689860834992e-08,
610
+ "loss": 4.56,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 0.08562992125984252,
615
+ "grad_norm": 14.99673843383789,
616
+ "learning_rate": 8.151093439363819e-08,
617
+ "loss": 4.6793,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 0.08661417322834646,
622
+ "grad_norm": 13.485681533813477,
623
+ "learning_rate": 8.250497017892645e-08,
624
+ "loss": 4.4611,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 0.08759842519685039,
629
+ "grad_norm": 52.90532302856445,
630
+ "learning_rate": 8.349900596421472e-08,
631
+ "loss": 9.623,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 0.08858267716535433,
636
+ "grad_norm": 18.402536392211914,
637
+ "learning_rate": 8.4493041749503e-08,
638
+ "loss": 5.0316,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 0.08956692913385826,
643
+ "grad_norm": 12.668858528137207,
644
+ "learning_rate": 8.548707753479126e-08,
645
+ "loss": 4.1771,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 0.09055118110236221,
650
+ "grad_norm": 17.02667236328125,
651
+ "learning_rate": 8.648111332007953e-08,
652
+ "loss": 4.9652,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 0.09153543307086615,
657
+ "grad_norm": 51.03838348388672,
658
+ "learning_rate": 8.747514910536782e-08,
659
+ "loss": 8.7432,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 0.09251968503937008,
664
+ "grad_norm": 15.051873207092285,
665
+ "learning_rate": 8.846918489065609e-08,
666
+ "loss": 4.6234,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 0.09350393700787402,
671
+ "grad_norm": 15.075822830200195,
672
+ "learning_rate": 8.946322067594435e-08,
673
+ "loss": 4.4016,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 0.09448818897637795,
678
+ "grad_norm": 14.73617172241211,
679
+ "learning_rate": 9.045725646123262e-08,
680
+ "loss": 4.9903,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 0.09547244094488189,
685
+ "grad_norm": 12.564230918884277,
686
+ "learning_rate": 9.14512922465209e-08,
687
+ "loss": 4.5606,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 0.09645669291338582,
692
+ "grad_norm": 12.062474250793457,
693
+ "learning_rate": 9.244532803180916e-08,
694
+ "loss": 4.9534,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 0.09744094488188976,
699
+ "grad_norm": 42.591461181640625,
700
+ "learning_rate": 9.343936381709743e-08,
701
+ "loss": 8.1838,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 0.0984251968503937,
706
+ "grad_norm": 12.18688678741455,
707
+ "learning_rate": 9.44333996023857e-08,
708
+ "loss": 4.9736,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 0.09940944881889764,
713
+ "grad_norm": 14.301939964294434,
714
+ "learning_rate": 9.542743538767397e-08,
715
+ "loss": 4.4733,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 0.10039370078740158,
720
+ "grad_norm": 15.062311172485352,
721
+ "learning_rate": 9.642147117296224e-08,
722
+ "loss": 4.9725,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 0.10137795275590551,
727
+ "grad_norm": 11.707622528076172,
728
+ "learning_rate": 9.74155069582505e-08,
729
+ "loss": 4.5861,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 0.10236220472440945,
734
+ "grad_norm": 42.08045196533203,
735
+ "learning_rate": 9.840954274353878e-08,
736
+ "loss": 7.7634,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 0.10334645669291338,
741
+ "grad_norm": 13.918876647949219,
742
+ "learning_rate": 9.940357852882706e-08,
743
+ "loss": 4.9915,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 0.10433070866141732,
748
+ "grad_norm": 12.0372953414917,
749
+ "learning_rate": 1.0039761431411533e-07,
750
+ "loss": 5.1391,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 0.10531496062992125,
755
+ "grad_norm": 16.739591598510742,
756
+ "learning_rate": 1.013916500994036e-07,
757
+ "loss": 5.0157,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 0.1062992125984252,
762
+ "grad_norm": 16.80640983581543,
763
+ "learning_rate": 1.0238568588469187e-07,
764
+ "loss": 4.0982,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 0.10728346456692914,
769
+ "grad_norm": 10.70914077758789,
770
+ "learning_rate": 1.0337972166998014e-07,
771
+ "loss": 4.2178,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 0.10826771653543307,
776
+ "grad_norm": 11.523154258728027,
777
+ "learning_rate": 1.043737574552684e-07,
778
+ "loss": 4.6193,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 0.10925196850393701,
783
+ "grad_norm": 14.624194145202637,
784
+ "learning_rate": 1.0536779324055668e-07,
785
+ "loss": 4.7638,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 0.11023622047244094,
790
+ "grad_norm": 11.593772888183594,
791
+ "learning_rate": 1.0636182902584495e-07,
792
+ "loss": 4.1207,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 0.11122047244094488,
797
+ "grad_norm": 18.1988468170166,
798
+ "learning_rate": 1.0735586481113321e-07,
799
+ "loss": 5.2034,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 0.11220472440944881,
804
+ "grad_norm": 14.054649353027344,
805
+ "learning_rate": 1.0834990059642149e-07,
806
+ "loss": 5.0693,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 0.11318897637795275,
811
+ "grad_norm": 12.262086868286133,
812
+ "learning_rate": 1.0934393638170976e-07,
813
+ "loss": 4.7895,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 0.1141732283464567,
818
+ "grad_norm": 12.955352783203125,
819
+ "learning_rate": 1.1033797216699802e-07,
820
+ "loss": 4.9486,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 0.11515748031496063,
825
+ "grad_norm": 14.151504516601562,
826
+ "learning_rate": 1.1133200795228631e-07,
827
+ "loss": 4.6552,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 0.11614173228346457,
832
+ "grad_norm": 11.60208511352539,
833
+ "learning_rate": 1.1232604373757458e-07,
834
+ "loss": 4.4555,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 0.1171259842519685,
839
+ "grad_norm": 11.05080509185791,
840
+ "learning_rate": 1.1332007952286284e-07,
841
+ "loss": 4.8977,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 0.11811023622047244,
846
+ "grad_norm": 43.464908599853516,
847
+ "learning_rate": 1.1431411530815111e-07,
848
+ "loss": 7.6836,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 0.11909448818897637,
853
+ "grad_norm": 11.060967445373535,
854
+ "learning_rate": 1.1530815109343939e-07,
855
+ "loss": 4.8106,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 0.12007874015748031,
860
+ "grad_norm": 11.406394004821777,
861
+ "learning_rate": 1.1630218687872765e-07,
862
+ "loss": 4.9958,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 0.12106299212598425,
867
+ "grad_norm": 12.630412101745605,
868
+ "learning_rate": 1.1729622266401592e-07,
869
+ "loss": 4.4585,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 0.1220472440944882,
874
+ "grad_norm": 42.50083541870117,
875
+ "learning_rate": 1.182902584493042e-07,
876
+ "loss": 7.5559,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 0.12303149606299213,
881
+ "grad_norm": 12.229009628295898,
882
+ "learning_rate": 1.1928429423459245e-07,
883
+ "loss": 4.2636,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 0.12401574803149606,
888
+ "grad_norm": 9.794486999511719,
889
+ "learning_rate": 1.2027833001988073e-07,
890
+ "loss": 4.0436,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 0.125,
895
+ "grad_norm": 11.505509376525879,
896
+ "learning_rate": 1.21272365805169e-07,
897
+ "loss": 4.7416,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 0.12598425196850394,
902
+ "grad_norm": 12.046993255615234,
903
+ "learning_rate": 1.222664015904573e-07,
904
+ "loss": 4.2215,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 0.12696850393700787,
909
+ "grad_norm": 33.795265197753906,
910
+ "learning_rate": 1.2326043737574557e-07,
911
+ "loss": 6.3561,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 0.1279527559055118,
916
+ "grad_norm": 33.15379333496094,
917
+ "learning_rate": 1.2425447316103382e-07,
918
+ "loss": 6.2299,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 0.12893700787401574,
923
+ "grad_norm": 11.174860000610352,
924
+ "learning_rate": 1.2524850894632207e-07,
925
+ "loss": 4.3492,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 0.12992125984251968,
930
+ "grad_norm": 10.696345329284668,
931
+ "learning_rate": 1.2624254473161035e-07,
932
+ "loss": 4.0216,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 0.1309055118110236,
937
+ "grad_norm": 35.72112274169922,
938
+ "learning_rate": 1.2723658051689863e-07,
939
+ "loss": 6.963,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 0.13188976377952755,
944
+ "grad_norm": 9.074990272521973,
945
+ "learning_rate": 1.282306163021869e-07,
946
+ "loss": 3.9474,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 0.1328740157480315,
951
+ "grad_norm": 11.409770965576172,
952
+ "learning_rate": 1.2922465208747519e-07,
953
+ "loss": 4.3437,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 0.13385826771653545,
958
+ "grad_norm": 10.008737564086914,
959
+ "learning_rate": 1.3021868787276344e-07,
960
+ "loss": 3.6267,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 0.13484251968503938,
965
+ "grad_norm": 11.481470108032227,
966
+ "learning_rate": 1.3121272365805172e-07,
967
+ "loss": 3.9896,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 0.13582677165354332,
972
+ "grad_norm": 12.856668472290039,
973
+ "learning_rate": 1.3220675944333997e-07,
974
+ "loss": 4.8156,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 0.13681102362204725,
979
+ "grad_norm": 12.462326049804688,
980
+ "learning_rate": 1.3320079522862825e-07,
981
+ "loss": 4.9751,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 0.1377952755905512,
986
+ "grad_norm": 13.430492401123047,
987
+ "learning_rate": 1.3419483101391653e-07,
988
+ "loss": 4.4144,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 0.13877952755905512,
993
+ "grad_norm": 24.038984298706055,
994
+ "learning_rate": 1.351888667992048e-07,
995
+ "loss": 4.7213,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 0.13976377952755906,
1000
+ "grad_norm": 39.7520637512207,
1001
+ "learning_rate": 1.3618290258449306e-07,
1002
+ "loss": 6.6081,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 0.140748031496063,
1007
+ "grad_norm": 10.726151466369629,
1008
+ "learning_rate": 1.3717693836978134e-07,
1009
+ "loss": 4.2929,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 0.14173228346456693,
1014
+ "grad_norm": 11.658100128173828,
1015
+ "learning_rate": 1.381709741550696e-07,
1016
+ "loss": 4.2537,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 0.14271653543307086,
1021
+ "grad_norm": 9.662211418151855,
1022
+ "learning_rate": 1.3916500994035787e-07,
1023
+ "loss": 4.0647,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 0.1437007874015748,
1028
+ "grad_norm": 10.004919052124023,
1029
+ "learning_rate": 1.4015904572564615e-07,
1030
+ "loss": 3.937,
1031
+ "step": 146
1032
+ },
1033
+ {
1034
+ "epoch": 0.14468503937007873,
1035
+ "grad_norm": 35.38102340698242,
1036
+ "learning_rate": 1.4115308151093443e-07,
1037
+ "loss": 5.6582,
1038
+ "step": 147
1039
+ },
1040
+ {
1041
+ "epoch": 0.14566929133858267,
1042
+ "grad_norm": 11.280171394348145,
1043
+ "learning_rate": 1.421471172962227e-07,
1044
+ "loss": 4.2648,
1045
+ "step": 148
1046
+ },
1047
+ {
1048
+ "epoch": 0.1466535433070866,
1049
+ "grad_norm": 10.569673538208008,
1050
+ "learning_rate": 1.4314115308151096e-07,
1051
+ "loss": 4.4429,
1052
+ "step": 149
1053
+ },
1054
+ {
1055
+ "epoch": 0.14763779527559054,
1056
+ "grad_norm": 14.092275619506836,
1057
+ "learning_rate": 1.4413518886679924e-07,
1058
+ "loss": 3.6197,
1059
+ "step": 150
1060
+ },
1061
+ {
1062
+ "epoch": 0.1486220472440945,
1063
+ "grad_norm": 8.960445404052734,
1064
+ "learning_rate": 1.451292246520875e-07,
1065
+ "loss": 3.7953,
1066
+ "step": 151
1067
+ },
1068
+ {
1069
+ "epoch": 0.14960629921259844,
1070
+ "grad_norm": 11.712779998779297,
1071
+ "learning_rate": 1.4612326043737577e-07,
1072
+ "loss": 3.8175,
1073
+ "step": 152
1074
+ },
1075
+ {
1076
+ "epoch": 0.15059055118110237,
1077
+ "grad_norm": 11.68183708190918,
1078
+ "learning_rate": 1.4711729622266402e-07,
1079
+ "loss": 4.5137,
1080
+ "step": 153
1081
+ },
1082
+ {
1083
+ "epoch": 0.15059055118110237,
1084
+ "eval_Qnli-dev_cosine_accuracy": 0.58203125,
1085
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9368094801902771,
1086
+ "eval_Qnli-dev_cosine_ap": 0.5484497034083067,
1087
+ "eval_Qnli-dev_cosine_f1": 0.6300268096514745,
1088
+ "eval_Qnli-dev_cosine_f1_threshold": 0.802739143371582,
1089
+ "eval_Qnli-dev_cosine_precision": 0.46078431372549017,
1090
+ "eval_Qnli-dev_cosine_recall": 0.9957627118644068,
1091
+ "eval_Qnli-dev_dot_accuracy": 0.58203125,
1092
+ "eval_Qnli-dev_dot_accuracy_threshold": 719.7518310546875,
1093
+ "eval_Qnli-dev_dot_ap": 0.548461685358088,
1094
+ "eval_Qnli-dev_dot_f1": 0.6300268096514745,
1095
+ "eval_Qnli-dev_dot_f1_threshold": 616.7227783203125,
1096
+ "eval_Qnli-dev_dot_precision": 0.46078431372549017,
1097
+ "eval_Qnli-dev_dot_recall": 0.9957627118644068,
1098
+ "eval_Qnli-dev_euclidean_accuracy": 0.58203125,
1099
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 9.853867530822754,
1100
+ "eval_Qnli-dev_euclidean_ap": 0.5484497034083067,
1101
+ "eval_Qnli-dev_euclidean_f1": 0.6300268096514745,
1102
+ "eval_Qnli-dev_euclidean_f1_threshold": 17.40953254699707,
1103
+ "eval_Qnli-dev_euclidean_precision": 0.46078431372549017,
1104
+ "eval_Qnli-dev_euclidean_recall": 0.9957627118644068,
1105
+ "eval_Qnli-dev_manhattan_accuracy": 0.607421875,
1106
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 182.1275177001953,
1107
+ "eval_Qnli-dev_manhattan_ap": 0.5750034744442096,
1108
+ "eval_Qnli-dev_manhattan_f1": 0.6303724928366763,
1109
+ "eval_Qnli-dev_manhattan_f1_threshold": 230.0565185546875,
1110
+ "eval_Qnli-dev_manhattan_precision": 0.47619047619047616,
1111
+ "eval_Qnli-dev_manhattan_recall": 0.9322033898305084,
1112
+ "eval_Qnli-dev_max_accuracy": 0.607421875,
1113
+ "eval_Qnli-dev_max_accuracy_threshold": 719.7518310546875,
1114
+ "eval_Qnli-dev_max_ap": 0.5750034744442096,
1115
+ "eval_Qnli-dev_max_f1": 0.6303724928366763,
1116
+ "eval_Qnli-dev_max_f1_threshold": 616.7227783203125,
1117
+ "eval_Qnli-dev_max_precision": 0.47619047619047616,
1118
+ "eval_Qnli-dev_max_recall": 0.9957627118644068,
1119
+ "eval_allNLI-dev_cosine_accuracy": 0.66796875,
1120
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9721524119377136,
1121
+ "eval_allNLI-dev_cosine_ap": 0.3857994503224615,
1122
+ "eval_allNLI-dev_cosine_f1": 0.5029239766081871,
1123
+ "eval_allNLI-dev_cosine_f1_threshold": 0.821484386920929,
1124
+ "eval_allNLI-dev_cosine_precision": 0.33659491193737767,
1125
+ "eval_allNLI-dev_cosine_recall": 0.9942196531791907,
1126
+ "eval_allNLI-dev_dot_accuracy": 0.66796875,
1127
+ "eval_allNLI-dev_dot_accuracy_threshold": 746.914794921875,
1128
+ "eval_allNLI-dev_dot_ap": 0.38572844452312516,
1129
+ "eval_allNLI-dev_dot_f1": 0.5029239766081871,
1130
+ "eval_allNLI-dev_dot_f1_threshold": 631.138916015625,
1131
+ "eval_allNLI-dev_dot_precision": 0.33659491193737767,
1132
+ "eval_allNLI-dev_dot_recall": 0.9942196531791907,
1133
+ "eval_allNLI-dev_euclidean_accuracy": 0.66796875,
1134
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 6.541449546813965,
1135
+ "eval_allNLI-dev_euclidean_ap": 0.3858031188548441,
1136
+ "eval_allNLI-dev_euclidean_f1": 0.5029239766081871,
1137
+ "eval_allNLI-dev_euclidean_f1_threshold": 16.558998107910156,
1138
+ "eval_allNLI-dev_euclidean_precision": 0.33659491193737767,
1139
+ "eval_allNLI-dev_euclidean_recall": 0.9942196531791907,
1140
+ "eval_allNLI-dev_manhattan_accuracy": 0.666015625,
1141
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 95.24527740478516,
1142
+ "eval_allNLI-dev_manhattan_ap": 0.39193409293721965,
1143
+ "eval_allNLI-dev_manhattan_f1": 0.5045317220543807,
1144
+ "eval_allNLI-dev_manhattan_f1_threshold": 254.973388671875,
1145
+ "eval_allNLI-dev_manhattan_precision": 0.34151329243353784,
1146
+ "eval_allNLI-dev_manhattan_recall": 0.9653179190751445,
1147
+ "eval_allNLI-dev_max_accuracy": 0.66796875,
1148
+ "eval_allNLI-dev_max_accuracy_threshold": 746.914794921875,
1149
+ "eval_allNLI-dev_max_ap": 0.39193409293721965,
1150
+ "eval_allNLI-dev_max_f1": 0.5045317220543807,
1151
+ "eval_allNLI-dev_max_f1_threshold": 631.138916015625,
1152
+ "eval_allNLI-dev_max_precision": 0.34151329243353784,
1153
+ "eval_allNLI-dev_max_recall": 0.9942196531791907,
1154
+ "eval_loss": 3.3210110664367676,
1155
+ "eval_runtime": 53.0927,
1156
+ "eval_samples_per_second": 31.341,
1157
+ "eval_sequential_score": 0.5750034744442096,
1158
+ "eval_steps_per_second": 0.132,
1159
+ "eval_sts-test_pearson_cosine": 0.12009124140478655,
1160
+ "eval_sts-test_pearson_dot": 0.11997652374043644,
1161
+ "eval_sts-test_pearson_euclidean": 0.15529980522625675,
1162
+ "eval_sts-test_pearson_manhattan": 0.18492770691981375,
1163
+ "eval_sts-test_pearson_max": 0.18492770691981375,
1164
+ "eval_sts-test_spearman_cosine": 0.180573622028628,
1165
+ "eval_sts-test_spearman_dot": 0.18041242798509616,
1166
+ "eval_sts-test_spearman_euclidean": 0.18058248277838349,
1167
+ "eval_sts-test_spearman_manhattan": 0.21139381574888486,
1168
+ "eval_sts-test_spearman_max": 0.21139381574888486,
1169
+ "step": 153
1170
+ },
1171
+ {
1172
+ "epoch": 0.1515748031496063,
1173
+ "grad_norm": 12.480842590332031,
1174
+ "learning_rate": 1.4811133200795232e-07,
1175
+ "loss": 4.3528,
1176
+ "step": 154
1177
+ },
1178
+ {
1179
+ "epoch": 0.15255905511811024,
1180
+ "grad_norm": 10.134323120117188,
1181
+ "learning_rate": 1.4910536779324058e-07,
1182
+ "loss": 3.6573,
1183
+ "step": 155
1184
+ },
1185
+ {
1186
+ "epoch": 0.15354330708661418,
1187
+ "grad_norm": 11.572721481323242,
1188
+ "learning_rate": 1.5009940357852886e-07,
1189
+ "loss": 3.5248,
1190
+ "step": 156
1191
+ },
1192
+ {
1193
+ "epoch": 0.1545275590551181,
1194
+ "grad_norm": 12.186054229736328,
1195
+ "learning_rate": 1.510934393638171e-07,
1196
+ "loss": 3.9275,
1197
+ "step": 157
1198
+ },
1199
+ {
1200
+ "epoch": 0.15551181102362205,
1201
+ "grad_norm": 42.491851806640625,
1202
+ "learning_rate": 1.5208747514910539e-07,
1203
+ "loss": 7.1868,
1204
+ "step": 158
1205
+ },
1206
+ {
1207
+ "epoch": 0.15649606299212598,
1208
+ "grad_norm": 10.867408752441406,
1209
+ "learning_rate": 1.5308151093439367e-07,
1210
+ "loss": 3.6294,
1211
+ "step": 159
1212
+ },
1213
+ {
1214
+ "epoch": 0.15748031496062992,
1215
+ "grad_norm": 11.463488578796387,
1216
+ "learning_rate": 1.5407554671968192e-07,
1217
+ "loss": 3.6886,
1218
+ "step": 160
1219
+ },
1220
+ {
1221
+ "epoch": 0.15846456692913385,
1222
+ "grad_norm": 10.748167991638184,
1223
+ "learning_rate": 1.5506958250497022e-07,
1224
+ "loss": 3.1873,
1225
+ "step": 161
1226
+ },
1227
+ {
1228
+ "epoch": 0.1594488188976378,
1229
+ "grad_norm": 38.69701385498047,
1230
+ "learning_rate": 1.5606361829025848e-07,
1231
+ "loss": 6.1951,
1232
+ "step": 162
1233
+ },
1234
+ {
1235
+ "epoch": 0.16043307086614172,
1236
+ "grad_norm": 11.077349662780762,
1237
+ "learning_rate": 1.5705765407554675e-07,
1238
+ "loss": 3.9747,
1239
+ "step": 163
1240
+ },
1241
+ {
1242
+ "epoch": 0.16141732283464566,
1243
+ "grad_norm": 47.21928405761719,
1244
+ "learning_rate": 1.58051689860835e-07,
1245
+ "loss": 7.004,
1246
+ "step": 164
1247
+ },
1248
+ {
1249
+ "epoch": 0.1624015748031496,
1250
+ "grad_norm": 15.23488712310791,
1251
+ "learning_rate": 1.5904572564612329e-07,
1252
+ "loss": 4.3221,
1253
+ "step": 165
1254
+ },
1255
+ {
1256
+ "epoch": 0.16338582677165353,
1257
+ "grad_norm": 10.329791069030762,
1258
+ "learning_rate": 1.6003976143141154e-07,
1259
+ "loss": 3.5963,
1260
+ "step": 166
1261
+ },
1262
+ {
1263
+ "epoch": 0.1643700787401575,
1264
+ "grad_norm": 9.624051094055176,
1265
+ "learning_rate": 1.6103379721669984e-07,
1266
+ "loss": 3.1988,
1267
+ "step": 167
1268
+ },
1269
+ {
1270
+ "epoch": 0.16535433070866143,
1271
+ "grad_norm": 11.511842727661133,
1272
+ "learning_rate": 1.620278330019881e-07,
1273
+ "loss": 3.8236,
1274
+ "step": 168
1275
+ },
1276
+ {
1277
+ "epoch": 0.16633858267716536,
1278
+ "grad_norm": 10.053628921508789,
1279
+ "learning_rate": 1.6302186878727637e-07,
1280
+ "loss": 3.5063,
1281
+ "step": 169
1282
+ },
1283
+ {
1284
+ "epoch": 0.1673228346456693,
1285
+ "grad_norm": 30.96078109741211,
1286
+ "learning_rate": 1.6401590457256465e-07,
1287
+ "loss": 5.9843,
1288
+ "step": 170
1289
+ },
1290
+ {
1291
+ "epoch": 0.16830708661417323,
1292
+ "grad_norm": 27.598793029785156,
1293
+ "learning_rate": 1.650099403578529e-07,
1294
+ "loss": 5.884,
1295
+ "step": 171
1296
+ },
1297
+ {
1298
+ "epoch": 0.16929133858267717,
1299
+ "grad_norm": 10.295365333557129,
1300
+ "learning_rate": 1.6600397614314118e-07,
1301
+ "loss": 4.1317,
1302
+ "step": 172
1303
+ },
1304
+ {
1305
+ "epoch": 0.1702755905511811,
1306
+ "grad_norm": 10.183586120605469,
1307
+ "learning_rate": 1.6699801192842944e-07,
1308
+ "loss": 3.9255,
1309
+ "step": 173
1310
+ },
1311
+ {
1312
+ "epoch": 0.17125984251968504,
1313
+ "grad_norm": 11.306191444396973,
1314
+ "learning_rate": 1.6799204771371774e-07,
1315
+ "loss": 4.1121,
1316
+ "step": 174
1317
+ },
1318
+ {
1319
+ "epoch": 0.17224409448818898,
1320
+ "grad_norm": 9.866985321044922,
1321
+ "learning_rate": 1.68986083499006e-07,
1322
+ "loss": 3.7748,
1323
+ "step": 175
1324
+ },
1325
+ {
1326
+ "epoch": 0.1732283464566929,
1327
+ "grad_norm": 25.779844284057617,
1328
+ "learning_rate": 1.6998011928429427e-07,
1329
+ "loss": 5.1602,
1330
+ "step": 176
1331
+ },
1332
+ {
1333
+ "epoch": 0.17421259842519685,
1334
+ "grad_norm": 22.994916915893555,
1335
+ "learning_rate": 1.7097415506958253e-07,
1336
+ "loss": 4.8807,
1337
+ "step": 177
1338
+ },
1339
+ {
1340
+ "epoch": 0.17519685039370078,
1341
+ "grad_norm": 9.323945045471191,
1342
+ "learning_rate": 1.719681908548708e-07,
1343
+ "loss": 3.4643,
1344
+ "step": 178
1345
+ },
1346
+ {
1347
+ "epoch": 0.17618110236220472,
1348
+ "grad_norm": 9.101490020751953,
1349
+ "learning_rate": 1.7296222664015906e-07,
1350
+ "loss": 3.4937,
1351
+ "step": 179
1352
+ },
1353
+ {
1354
+ "epoch": 0.17716535433070865,
1355
+ "grad_norm": 18.585084915161133,
1356
+ "learning_rate": 1.7395626242544734e-07,
1357
+ "loss": 5.2731,
1358
+ "step": 180
1359
+ },
1360
+ {
1361
+ "epoch": 0.1781496062992126,
1362
+ "grad_norm": 17.947126388549805,
1363
+ "learning_rate": 1.7495029821073564e-07,
1364
+ "loss": 4.6416,
1365
+ "step": 181
1366
+ },
1367
+ {
1368
+ "epoch": 0.17913385826771652,
1369
+ "grad_norm": 9.495512008666992,
1370
+ "learning_rate": 1.759443339960239e-07,
1371
+ "loss": 3.5226,
1372
+ "step": 182
1373
+ },
1374
+ {
1375
+ "epoch": 0.18011811023622049,
1376
+ "grad_norm": 21.095630645751953,
1377
+ "learning_rate": 1.7693836978131217e-07,
1378
+ "loss": 4.7794,
1379
+ "step": 183
1380
+ },
1381
+ {
1382
+ "epoch": 0.18110236220472442,
1383
+ "grad_norm": 9.708288192749023,
1384
+ "learning_rate": 1.7793240556660042e-07,
1385
+ "loss": 3.8504,
1386
+ "step": 184
1387
+ },
1388
+ {
1389
+ "epoch": 0.18208661417322836,
1390
+ "grad_norm": 10.588545799255371,
1391
+ "learning_rate": 1.789264413518887e-07,
1392
+ "loss": 3.5391,
1393
+ "step": 185
1394
+ },
1395
+ {
1396
+ "epoch": 0.1830708661417323,
1397
+ "grad_norm": 10.238434791564941,
1398
+ "learning_rate": 1.7992047713717695e-07,
1399
+ "loss": 4.0291,
1400
+ "step": 186
1401
+ },
1402
+ {
1403
+ "epoch": 0.18405511811023623,
1404
+ "grad_norm": 12.710492134094238,
1405
+ "learning_rate": 1.8091451292246523e-07,
1406
+ "loss": 3.5606,
1407
+ "step": 187
1408
+ },
1409
+ {
1410
+ "epoch": 0.18503937007874016,
1411
+ "grad_norm": 13.008979797363281,
1412
+ "learning_rate": 1.819085487077535e-07,
1413
+ "loss": 3.8957,
1414
+ "step": 188
1415
+ },
1416
+ {
1417
+ "epoch": 0.1860236220472441,
1418
+ "grad_norm": 16.594804763793945,
1419
+ "learning_rate": 1.829025844930418e-07,
1420
+ "loss": 4.3657,
1421
+ "step": 189
1422
+ },
1423
+ {
1424
+ "epoch": 0.18700787401574803,
1425
+ "grad_norm": 25.25087547302246,
1426
+ "learning_rate": 1.8389662027833004e-07,
1427
+ "loss": 5.0173,
1428
+ "step": 190
1429
+ },
1430
+ {
1431
+ "epoch": 0.18799212598425197,
1432
+ "grad_norm": 18.24939727783203,
1433
+ "learning_rate": 1.8489065606361832e-07,
1434
+ "loss": 4.3915,
1435
+ "step": 191
1436
+ },
1437
+ {
1438
+ "epoch": 0.1889763779527559,
1439
+ "grad_norm": 11.081143379211426,
1440
+ "learning_rate": 1.8588469184890657e-07,
1441
+ "loss": 3.4613,
1442
+ "step": 192
1443
+ },
1444
+ {
1445
+ "epoch": 0.18996062992125984,
1446
+ "grad_norm": 10.244654655456543,
1447
+ "learning_rate": 1.8687872763419485e-07,
1448
+ "loss": 3.2005,
1449
+ "step": 193
1450
+ },
1451
+ {
1452
+ "epoch": 0.19094488188976377,
1453
+ "grad_norm": 10.305020332336426,
1454
+ "learning_rate": 1.8787276341948313e-07,
1455
+ "loss": 3.3986,
1456
+ "step": 194
1457
+ },
1458
+ {
1459
+ "epoch": 0.1919291338582677,
1460
+ "grad_norm": 12.405226707458496,
1461
+ "learning_rate": 1.888667992047714e-07,
1462
+ "loss": 3.7937,
1463
+ "step": 195
1464
+ },
1465
+ {
1466
+ "epoch": 0.19291338582677164,
1467
+ "grad_norm": 10.247261047363281,
1468
+ "learning_rate": 1.898608349900597e-07,
1469
+ "loss": 3.8981,
1470
+ "step": 196
1471
+ },
1472
+ {
1473
+ "epoch": 0.19389763779527558,
1474
+ "grad_norm": 10.535922050476074,
1475
+ "learning_rate": 1.9085487077534794e-07,
1476
+ "loss": 3.7051,
1477
+ "step": 197
1478
+ },
1479
+ {
1480
+ "epoch": 0.19488188976377951,
1481
+ "grad_norm": 9.828015327453613,
1482
+ "learning_rate": 1.9184890656063622e-07,
1483
+ "loss": 3.8028,
1484
+ "step": 198
1485
+ },
1486
+ {
1487
+ "epoch": 0.19586614173228348,
1488
+ "grad_norm": 9.550689697265625,
1489
+ "learning_rate": 1.9284294234592447e-07,
1490
+ "loss": 3.3294,
1491
+ "step": 199
1492
+ },
1493
+ {
1494
+ "epoch": 0.1968503937007874,
1495
+ "grad_norm": 16.766008377075195,
1496
+ "learning_rate": 1.9383697813121275e-07,
1497
+ "loss": 4.1252,
1498
+ "step": 200
1499
+ },
1500
+ {
1501
+ "epoch": 0.19783464566929135,
1502
+ "grad_norm": 16.56619644165039,
1503
+ "learning_rate": 1.94831013916501e-07,
1504
+ "loss": 4.2564,
1505
+ "step": 201
1506
+ },
1507
+ {
1508
+ "epoch": 0.19881889763779528,
1509
+ "grad_norm": 17.413206100463867,
1510
+ "learning_rate": 1.958250497017893e-07,
1511
+ "loss": 3.8258,
1512
+ "step": 202
1513
+ },
1514
+ {
1515
+ "epoch": 0.19980314960629922,
1516
+ "grad_norm": 10.175580978393555,
1517
+ "learning_rate": 1.9681908548707756e-07,
1518
+ "loss": 3.1025,
1519
+ "step": 203
1520
+ },
1521
+ {
1522
+ "epoch": 0.20078740157480315,
1523
+ "grad_norm": 11.58014965057373,
1524
+ "learning_rate": 1.9781312127236584e-07,
1525
+ "loss": 3.5038,
1526
+ "step": 204
1527
+ },
1528
+ {
1529
+ "epoch": 0.2017716535433071,
1530
+ "grad_norm": 9.23029613494873,
1531
+ "learning_rate": 1.9880715705765412e-07,
1532
+ "loss": 3.6021,
1533
+ "step": 205
1534
+ },
1535
+ {
1536
+ "epoch": 0.20275590551181102,
1537
+ "grad_norm": 11.87929630279541,
1538
+ "learning_rate": 1.9980119284294237e-07,
1539
+ "loss": 3.7637,
1540
+ "step": 206
1541
+ },
1542
+ {
1543
+ "epoch": 0.20374015748031496,
1544
+ "grad_norm": 10.55318546295166,
1545
+ "learning_rate": 2.0079522862823065e-07,
1546
+ "loss": 3.2563,
1547
+ "step": 207
1548
+ },
1549
+ {
1550
+ "epoch": 0.2047244094488189,
1551
+ "grad_norm": 15.530010223388672,
1552
+ "learning_rate": 2.017892644135189e-07,
1553
+ "loss": 3.9323,
1554
+ "step": 208
1555
+ },
1556
+ {
1557
+ "epoch": 0.20570866141732283,
1558
+ "grad_norm": 12.805971145629883,
1559
+ "learning_rate": 2.027833001988072e-07,
1560
+ "loss": 3.489,
1561
+ "step": 209
1562
+ },
1563
+ {
1564
+ "epoch": 0.20669291338582677,
1565
+ "grad_norm": 11.042547225952148,
1566
+ "learning_rate": 2.0377733598409546e-07,
1567
+ "loss": 3.6549,
1568
+ "step": 210
1569
+ },
1570
+ {
1571
+ "epoch": 0.2076771653543307,
1572
+ "grad_norm": 13.565040588378906,
1573
+ "learning_rate": 2.0477137176938374e-07,
1574
+ "loss": 3.1609,
1575
+ "step": 211
1576
+ },
1577
+ {
1578
+ "epoch": 0.20866141732283464,
1579
+ "grad_norm": 10.574054718017578,
1580
+ "learning_rate": 2.05765407554672e-07,
1581
+ "loss": 3.2467,
1582
+ "step": 212
1583
+ },
1584
+ {
1585
+ "epoch": 0.20964566929133857,
1586
+ "grad_norm": 9.994040489196777,
1587
+ "learning_rate": 2.0675944333996027e-07,
1588
+ "loss": 3.4514,
1589
+ "step": 213
1590
+ },
1591
+ {
1592
+ "epoch": 0.2106299212598425,
1593
+ "grad_norm": 11.422471046447754,
1594
+ "learning_rate": 2.0775347912524852e-07,
1595
+ "loss": 3.4945,
1596
+ "step": 214
1597
+ },
1598
+ {
1599
+ "epoch": 0.21161417322834647,
1600
+ "grad_norm": 13.696518898010254,
1601
+ "learning_rate": 2.087475149105368e-07,
1602
+ "loss": 3.5932,
1603
+ "step": 215
1604
+ },
1605
+ {
1606
+ "epoch": 0.2125984251968504,
1607
+ "grad_norm": 9.801803588867188,
1608
+ "learning_rate": 2.097415506958251e-07,
1609
+ "loss": 3.2289,
1610
+ "step": 216
1611
+ },
1612
+ {
1613
+ "epoch": 0.21358267716535434,
1614
+ "grad_norm": 10.710221290588379,
1615
+ "learning_rate": 2.1073558648111336e-07,
1616
+ "loss": 3.3279,
1617
+ "step": 217
1618
+ },
1619
+ {
1620
+ "epoch": 0.21456692913385828,
1621
+ "grad_norm": 12.974212646484375,
1622
+ "learning_rate": 2.1172962226640164e-07,
1623
+ "loss": 3.8141,
1624
+ "step": 218
1625
+ },
1626
+ {
1627
+ "epoch": 0.2155511811023622,
1628
+ "grad_norm": 10.196144104003906,
1629
+ "learning_rate": 2.127236580516899e-07,
1630
+ "loss": 3.1171,
1631
+ "step": 219
1632
+ },
1633
+ {
1634
+ "epoch": 0.21653543307086615,
1635
+ "grad_norm": 12.518829345703125,
1636
+ "learning_rate": 2.1371769383697817e-07,
1637
+ "loss": 3.6287,
1638
+ "step": 220
1639
+ },
1640
+ {
1641
+ "epoch": 0.21751968503937008,
1642
+ "grad_norm": 18.030723571777344,
1643
+ "learning_rate": 2.1471172962226642e-07,
1644
+ "loss": 3.8517,
1645
+ "step": 221
1646
+ },
1647
+ {
1648
+ "epoch": 0.21850393700787402,
1649
+ "grad_norm": 11.197093963623047,
1650
+ "learning_rate": 2.1570576540755473e-07,
1651
+ "loss": 3.3836,
1652
+ "step": 222
1653
+ },
1654
+ {
1655
+ "epoch": 0.21948818897637795,
1656
+ "grad_norm": 10.613561630249023,
1657
+ "learning_rate": 2.1669980119284298e-07,
1658
+ "loss": 3.425,
1659
+ "step": 223
1660
+ },
1661
+ {
1662
+ "epoch": 0.2204724409448819,
1663
+ "grad_norm": 11.70770263671875,
1664
+ "learning_rate": 2.1769383697813126e-07,
1665
+ "loss": 3.6246,
1666
+ "step": 224
1667
+ },
1668
+ {
1669
+ "epoch": 0.22145669291338582,
1670
+ "grad_norm": 10.802409172058105,
1671
+ "learning_rate": 2.186878727634195e-07,
1672
+ "loss": 3.5682,
1673
+ "step": 225
1674
+ },
1675
+ {
1676
+ "epoch": 0.22244094488188976,
1677
+ "grad_norm": 10.422605514526367,
1678
+ "learning_rate": 2.196819085487078e-07,
1679
+ "loss": 3.3034,
1680
+ "step": 226
1681
+ },
1682
+ {
1683
+ "epoch": 0.2234251968503937,
1684
+ "grad_norm": 17.00844383239746,
1685
+ "learning_rate": 2.2067594433399604e-07,
1686
+ "loss": 3.9251,
1687
+ "step": 227
1688
+ },
1689
+ {
1690
+ "epoch": 0.22440944881889763,
1691
+ "grad_norm": 11.708518028259277,
1692
+ "learning_rate": 2.2166998011928432e-07,
1693
+ "loss": 3.146,
1694
+ "step": 228
1695
+ },
1696
+ {
1697
+ "epoch": 0.22539370078740156,
1698
+ "grad_norm": 11.237220764160156,
1699
+ "learning_rate": 2.2266401590457263e-07,
1700
+ "loss": 3.8859,
1701
+ "step": 229
1702
+ },
1703
+ {
1704
+ "epoch": 0.2263779527559055,
1705
+ "grad_norm": 10.216686248779297,
1706
+ "learning_rate": 2.2365805168986088e-07,
1707
+ "loss": 3.2977,
1708
+ "step": 230
1709
+ },
1710
+ {
1711
+ "epoch": 0.22736220472440946,
1712
+ "grad_norm": 11.36294937133789,
1713
+ "learning_rate": 2.2465208747514916e-07,
1714
+ "loss": 3.2664,
1715
+ "step": 231
1716
+ },
1717
+ {
1718
+ "epoch": 0.2283464566929134,
1719
+ "grad_norm": 9.619734764099121,
1720
+ "learning_rate": 2.256461232604374e-07,
1721
+ "loss": 3.1275,
1722
+ "step": 232
1723
+ },
1724
+ {
1725
+ "epoch": 0.22933070866141733,
1726
+ "grad_norm": 10.229348182678223,
1727
+ "learning_rate": 2.266401590457257e-07,
1728
+ "loss": 3.2408,
1729
+ "step": 233
1730
+ },
1731
+ {
1732
+ "epoch": 0.23031496062992127,
1733
+ "grad_norm": 11.304025650024414,
1734
+ "learning_rate": 2.2763419483101394e-07,
1735
+ "loss": 2.907,
1736
+ "step": 234
1737
+ },
1738
+ {
1739
+ "epoch": 0.2312992125984252,
1740
+ "grad_norm": 11.743476867675781,
1741
+ "learning_rate": 2.2862823061630222e-07,
1742
+ "loss": 2.9178,
1743
+ "step": 235
1744
+ },
1745
+ {
1746
+ "epoch": 0.23228346456692914,
1747
+ "grad_norm": 12.260607719421387,
1748
+ "learning_rate": 2.296222664015905e-07,
1749
+ "loss": 3.324,
1750
+ "step": 236
1751
+ },
1752
+ {
1753
+ "epoch": 0.23326771653543307,
1754
+ "grad_norm": 12.115522384643555,
1755
+ "learning_rate": 2.3061630218687878e-07,
1756
+ "loss": 2.9172,
1757
+ "step": 237
1758
+ },
1759
+ {
1760
+ "epoch": 0.234251968503937,
1761
+ "grad_norm": 13.81004810333252,
1762
+ "learning_rate": 2.3161033797216703e-07,
1763
+ "loss": 3.4324,
1764
+ "step": 238
1765
+ },
1766
+ {
1767
+ "epoch": 0.23523622047244094,
1768
+ "grad_norm": 28.633853912353516,
1769
+ "learning_rate": 2.326043737574553e-07,
1770
+ "loss": 4.0563,
1771
+ "step": 239
1772
+ },
1773
+ {
1774
+ "epoch": 0.23622047244094488,
1775
+ "grad_norm": 10.444801330566406,
1776
+ "learning_rate": 2.335984095427436e-07,
1777
+ "loss": 2.8736,
1778
+ "step": 240
1779
+ },
1780
+ {
1781
+ "epoch": 0.2372047244094488,
1782
+ "grad_norm": 30.760478973388672,
1783
+ "learning_rate": 2.3459244532803184e-07,
1784
+ "loss": 4.7174,
1785
+ "step": 241
1786
+ },
1787
+ {
1788
+ "epoch": 0.23818897637795275,
1789
+ "grad_norm": 11.556520462036133,
1790
+ "learning_rate": 2.3558648111332012e-07,
1791
+ "loss": 3.2025,
1792
+ "step": 242
1793
+ },
1794
+ {
1795
+ "epoch": 0.23917322834645668,
1796
+ "grad_norm": 10.850605010986328,
1797
+ "learning_rate": 2.365805168986084e-07,
1798
+ "loss": 2.7835,
1799
+ "step": 243
1800
+ },
1801
+ {
1802
+ "epoch": 0.24015748031496062,
1803
+ "grad_norm": 26.828187942504883,
1804
+ "learning_rate": 2.3757455268389668e-07,
1805
+ "loss": 4.3158,
1806
+ "step": 244
1807
+ },
1808
+ {
1809
+ "epoch": 0.24114173228346455,
1810
+ "grad_norm": 10.565204620361328,
1811
+ "learning_rate": 2.385685884691849e-07,
1812
+ "loss": 2.8619,
1813
+ "step": 245
1814
+ },
1815
+ {
1816
+ "epoch": 0.2421259842519685,
1817
+ "grad_norm": 11.693890571594238,
1818
+ "learning_rate": 2.395626242544732e-07,
1819
+ "loss": 2.5156,
1820
+ "step": 246
1821
+ },
1822
+ {
1823
+ "epoch": 0.24311023622047245,
1824
+ "grad_norm": 14.487508773803711,
1825
+ "learning_rate": 2.4055666003976146e-07,
1826
+ "loss": 3.2144,
1827
+ "step": 247
1828
+ },
1829
+ {
1830
+ "epoch": 0.2440944881889764,
1831
+ "grad_norm": 22.58393669128418,
1832
+ "learning_rate": 2.4155069582504976e-07,
1833
+ "loss": 3.5927,
1834
+ "step": 248
1835
+ },
1836
+ {
1837
+ "epoch": 0.24507874015748032,
1838
+ "grad_norm": 10.880494117736816,
1839
+ "learning_rate": 2.42544731610338e-07,
1840
+ "loss": 2.6059,
1841
+ "step": 249
1842
+ },
1843
+ {
1844
+ "epoch": 0.24606299212598426,
1845
+ "grad_norm": 10.982386589050293,
1846
+ "learning_rate": 2.4353876739562627e-07,
1847
+ "loss": 2.9758,
1848
+ "step": 250
1849
+ },
1850
+ {
1851
+ "epoch": 0.2470472440944882,
1852
+ "grad_norm": 20.552265167236328,
1853
+ "learning_rate": 2.445328031809146e-07,
1854
+ "loss": 3.9214,
1855
+ "step": 251
1856
+ },
1857
+ {
1858
+ "epoch": 0.24803149606299213,
1859
+ "grad_norm": 11.680928230285645,
1860
+ "learning_rate": 2.4552683896620283e-07,
1861
+ "loss": 3.2892,
1862
+ "step": 252
1863
+ },
1864
+ {
1865
+ "epoch": 0.24901574803149606,
1866
+ "grad_norm": 10.980914115905762,
1867
+ "learning_rate": 2.4652087475149113e-07,
1868
+ "loss": 2.9503,
1869
+ "step": 253
1870
+ },
1871
+ {
1872
+ "epoch": 0.25,
1873
+ "grad_norm": 11.629937171936035,
1874
+ "learning_rate": 2.475149105367794e-07,
1875
+ "loss": 2.5969,
1876
+ "step": 254
1877
+ },
1878
+ {
1879
+ "epoch": 0.25098425196850394,
1880
+ "grad_norm": 11.606547355651855,
1881
+ "learning_rate": 2.4850894632206764e-07,
1882
+ "loss": 2.9908,
1883
+ "step": 255
1884
+ },
1885
+ {
1886
+ "epoch": 0.25196850393700787,
1887
+ "grad_norm": 10.955612182617188,
1888
+ "learning_rate": 2.495029821073559e-07,
1889
+ "loss": 2.8995,
1890
+ "step": 256
1891
+ },
1892
+ {
1893
+ "epoch": 0.2529527559055118,
1894
+ "grad_norm": 11.561877250671387,
1895
+ "learning_rate": 2.5049701789264414e-07,
1896
+ "loss": 3.124,
1897
+ "step": 257
1898
+ },
1899
+ {
1900
+ "epoch": 0.25393700787401574,
1901
+ "grad_norm": 12.040234565734863,
1902
+ "learning_rate": 2.5149105367793245e-07,
1903
+ "loss": 3.1197,
1904
+ "step": 258
1905
+ },
1906
+ {
1907
+ "epoch": 0.2549212598425197,
1908
+ "grad_norm": 12.25498104095459,
1909
+ "learning_rate": 2.524850894632207e-07,
1910
+ "loss": 2.3073,
1911
+ "step": 259
1912
+ },
1913
+ {
1914
+ "epoch": 0.2559055118110236,
1915
+ "grad_norm": 9.674198150634766,
1916
+ "learning_rate": 2.53479125248509e-07,
1917
+ "loss": 2.8441,
1918
+ "step": 260
1919
+ },
1920
+ {
1921
+ "epoch": 0.25688976377952755,
1922
+ "grad_norm": 11.55859661102295,
1923
+ "learning_rate": 2.5447316103379726e-07,
1924
+ "loss": 1.9788,
1925
+ "step": 261
1926
+ },
1927
+ {
1928
+ "epoch": 0.2578740157480315,
1929
+ "grad_norm": 12.343245506286621,
1930
+ "learning_rate": 2.554671968190855e-07,
1931
+ "loss": 2.1442,
1932
+ "step": 262
1933
+ },
1934
+ {
1935
+ "epoch": 0.2588582677165354,
1936
+ "grad_norm": 35.242835998535156,
1937
+ "learning_rate": 2.564612326043738e-07,
1938
+ "loss": 4.9015,
1939
+ "step": 263
1940
+ },
1941
+ {
1942
+ "epoch": 0.25984251968503935,
1943
+ "grad_norm": 13.894003868103027,
1944
+ "learning_rate": 2.5745526838966207e-07,
1945
+ "loss": 2.7866,
1946
+ "step": 264
1947
+ },
1948
+ {
1949
+ "epoch": 0.2608267716535433,
1950
+ "grad_norm": 12.785796165466309,
1951
+ "learning_rate": 2.5844930417495037e-07,
1952
+ "loss": 2.4588,
1953
+ "step": 265
1954
+ },
1955
+ {
1956
+ "epoch": 0.2618110236220472,
1957
+ "grad_norm": 13.115778923034668,
1958
+ "learning_rate": 2.5944333996023857e-07,
1959
+ "loss": 2.3909,
1960
+ "step": 266
1961
+ },
1962
+ {
1963
+ "epoch": 0.26279527559055116,
1964
+ "grad_norm": 38.795326232910156,
1965
+ "learning_rate": 2.604373757455269e-07,
1966
+ "loss": 4.7394,
1967
+ "step": 267
1968
+ },
1969
+ {
1970
+ "epoch": 0.2637795275590551,
1971
+ "grad_norm": 14.078845977783203,
1972
+ "learning_rate": 2.614314115308152e-07,
1973
+ "loss": 3.1581,
1974
+ "step": 268
1975
+ },
1976
+ {
1977
+ "epoch": 0.26476377952755903,
1978
+ "grad_norm": 36.33890151977539,
1979
+ "learning_rate": 2.6242544731610343e-07,
1980
+ "loss": 3.973,
1981
+ "step": 269
1982
+ },
1983
+ {
1984
+ "epoch": 0.265748031496063,
1985
+ "grad_norm": 32.12986755371094,
1986
+ "learning_rate": 2.634194831013917e-07,
1987
+ "loss": 4.1565,
1988
+ "step": 270
1989
+ },
1990
+ {
1991
+ "epoch": 0.26673228346456695,
1992
+ "grad_norm": 13.353660583496094,
1993
+ "learning_rate": 2.6441351888667994e-07,
1994
+ "loss": 2.5183,
1995
+ "step": 271
1996
+ },
1997
+ {
1998
+ "epoch": 0.2677165354330709,
1999
+ "grad_norm": 24.31439208984375,
2000
+ "learning_rate": 2.6540755467196824e-07,
2001
+ "loss": 3.614,
2002
+ "step": 272
2003
+ },
2004
+ {
2005
+ "epoch": 0.2687007874015748,
2006
+ "grad_norm": 11.95603084564209,
2007
+ "learning_rate": 2.664015904572565e-07,
2008
+ "loss": 2.6858,
2009
+ "step": 273
2010
+ },
2011
+ {
2012
+ "epoch": 0.26968503937007876,
2013
+ "grad_norm": 14.08092975616455,
2014
+ "learning_rate": 2.6739562624254475e-07,
2015
+ "loss": 3.1182,
2016
+ "step": 274
2017
+ },
2018
+ {
2019
+ "epoch": 0.2706692913385827,
2020
+ "grad_norm": 11.759414672851562,
2021
+ "learning_rate": 2.6838966202783305e-07,
2022
+ "loss": 2.9628,
2023
+ "step": 275
2024
+ },
2025
+ {
2026
+ "epoch": 0.27165354330708663,
2027
+ "grad_norm": 13.239761352539062,
2028
+ "learning_rate": 2.693836978131213e-07,
2029
+ "loss": 2.8376,
2030
+ "step": 276
2031
+ },
2032
+ {
2033
+ "epoch": 0.27263779527559057,
2034
+ "grad_norm": 13.551993370056152,
2035
+ "learning_rate": 2.703777335984096e-07,
2036
+ "loss": 2.7858,
2037
+ "step": 277
2038
+ },
2039
+ {
2040
+ "epoch": 0.2736220472440945,
2041
+ "grad_norm": 12.110513687133789,
2042
+ "learning_rate": 2.7137176938369786e-07,
2043
+ "loss": 2.1037,
2044
+ "step": 278
2045
+ },
2046
+ {
2047
+ "epoch": 0.27460629921259844,
2048
+ "grad_norm": 14.467287063598633,
2049
+ "learning_rate": 2.723658051689861e-07,
2050
+ "loss": 3.0436,
2051
+ "step": 279
2052
+ },
2053
+ {
2054
+ "epoch": 0.2755905511811024,
2055
+ "grad_norm": 17.148061752319336,
2056
+ "learning_rate": 2.7335984095427437e-07,
2057
+ "loss": 3.4125,
2058
+ "step": 280
2059
+ },
2060
+ {
2061
+ "epoch": 0.2765748031496063,
2062
+ "grad_norm": 12.052011489868164,
2063
+ "learning_rate": 2.743538767395627e-07,
2064
+ "loss": 2.5027,
2065
+ "step": 281
2066
+ },
2067
+ {
2068
+ "epoch": 0.27755905511811024,
2069
+ "grad_norm": 11.502729415893555,
2070
+ "learning_rate": 2.75347912524851e-07,
2071
+ "loss": 2.7922,
2072
+ "step": 282
2073
+ },
2074
+ {
2075
+ "epoch": 0.2785433070866142,
2076
+ "grad_norm": 15.440553665161133,
2077
+ "learning_rate": 2.763419483101392e-07,
2078
+ "loss": 2.9762,
2079
+ "step": 283
2080
+ },
2081
+ {
2082
+ "epoch": 0.2795275590551181,
2083
+ "grad_norm": 12.604522705078125,
2084
+ "learning_rate": 2.773359840954275e-07,
2085
+ "loss": 2.6458,
2086
+ "step": 284
2087
+ },
2088
+ {
2089
+ "epoch": 0.28051181102362205,
2090
+ "grad_norm": 12.404953002929688,
2091
+ "learning_rate": 2.7833001988071574e-07,
2092
+ "loss": 2.962,
2093
+ "step": 285
2094
+ },
2095
+ {
2096
+ "epoch": 0.281496062992126,
2097
+ "grad_norm": 12.527341842651367,
2098
+ "learning_rate": 2.7932405566600404e-07,
2099
+ "loss": 2.5439,
2100
+ "step": 286
2101
+ },
2102
+ {
2103
+ "epoch": 0.2824803149606299,
2104
+ "grad_norm": 12.331761360168457,
2105
+ "learning_rate": 2.803180914512923e-07,
2106
+ "loss": 2.8437,
2107
+ "step": 287
2108
+ },
2109
+ {
2110
+ "epoch": 0.28346456692913385,
2111
+ "grad_norm": 17.64342498779297,
2112
+ "learning_rate": 2.8131212723658055e-07,
2113
+ "loss": 3.2134,
2114
+ "step": 288
2115
+ },
2116
+ {
2117
+ "epoch": 0.2844488188976378,
2118
+ "grad_norm": 12.74220085144043,
2119
+ "learning_rate": 2.8230616302186885e-07,
2120
+ "loss": 2.5655,
2121
+ "step": 289
2122
+ },
2123
+ {
2124
+ "epoch": 0.2854330708661417,
2125
+ "grad_norm": 13.071759223937988,
2126
+ "learning_rate": 2.833001988071571e-07,
2127
+ "loss": 2.9465,
2128
+ "step": 290
2129
+ },
2130
+ {
2131
+ "epoch": 0.28641732283464566,
2132
+ "grad_norm": 12.093460083007812,
2133
+ "learning_rate": 2.842942345924454e-07,
2134
+ "loss": 2.4653,
2135
+ "step": 291
2136
+ },
2137
+ {
2138
+ "epoch": 0.2874015748031496,
2139
+ "grad_norm": 18.303346633911133,
2140
+ "learning_rate": 2.852882703777336e-07,
2141
+ "loss": 3.1467,
2142
+ "step": 292
2143
+ },
2144
+ {
2145
+ "epoch": 0.28838582677165353,
2146
+ "grad_norm": 11.613899230957031,
2147
+ "learning_rate": 2.862823061630219e-07,
2148
+ "loss": 2.6551,
2149
+ "step": 293
2150
+ },
2151
+ {
2152
+ "epoch": 0.28937007874015747,
2153
+ "grad_norm": 12.590253829956055,
2154
+ "learning_rate": 2.8727634194831017e-07,
2155
+ "loss": 2.5098,
2156
+ "step": 294
2157
+ },
2158
+ {
2159
+ "epoch": 0.2903543307086614,
2160
+ "grad_norm": 13.879076957702637,
2161
+ "learning_rate": 2.8827037773359847e-07,
2162
+ "loss": 2.5988,
2163
+ "step": 295
2164
+ },
2165
+ {
2166
+ "epoch": 0.29133858267716534,
2167
+ "grad_norm": 23.504112243652344,
2168
+ "learning_rate": 2.892644135188867e-07,
2169
+ "loss": 3.778,
2170
+ "step": 296
2171
+ },
2172
+ {
2173
+ "epoch": 0.29232283464566927,
2174
+ "grad_norm": 11.794881820678711,
2175
+ "learning_rate": 2.90258449304175e-07,
2176
+ "loss": 2.6257,
2177
+ "step": 297
2178
+ },
2179
+ {
2180
+ "epoch": 0.2933070866141732,
2181
+ "grad_norm": 11.491183280944824,
2182
+ "learning_rate": 2.912524850894633e-07,
2183
+ "loss": 2.5142,
2184
+ "step": 298
2185
+ },
2186
+ {
2187
+ "epoch": 0.29429133858267714,
2188
+ "grad_norm": 11.801483154296875,
2189
+ "learning_rate": 2.9224652087475153e-07,
2190
+ "loss": 2.3182,
2191
+ "step": 299
2192
+ },
2193
+ {
2194
+ "epoch": 0.2952755905511811,
2195
+ "grad_norm": 21.28721046447754,
2196
+ "learning_rate": 2.9324055666003984e-07,
2197
+ "loss": 3.3505,
2198
+ "step": 300
2199
+ },
2200
+ {
2201
+ "epoch": 0.296259842519685,
2202
+ "grad_norm": 14.353005409240723,
2203
+ "learning_rate": 2.9423459244532804e-07,
2204
+ "loss": 2.9615,
2205
+ "step": 301
2206
+ },
2207
+ {
2208
+ "epoch": 0.297244094488189,
2209
+ "grad_norm": 17.378894805908203,
2210
+ "learning_rate": 2.9522862823061634e-07,
2211
+ "loss": 2.9136,
2212
+ "step": 302
2213
+ },
2214
+ {
2215
+ "epoch": 0.29822834645669294,
2216
+ "grad_norm": 12.216327667236328,
2217
+ "learning_rate": 2.9622266401590465e-07,
2218
+ "loss": 2.6192,
2219
+ "step": 303
2220
+ },
2221
+ {
2222
+ "epoch": 0.2992125984251969,
2223
+ "grad_norm": 12.561356544494629,
2224
+ "learning_rate": 2.972166998011929e-07,
2225
+ "loss": 2.3255,
2226
+ "step": 304
2227
+ },
2228
+ {
2229
+ "epoch": 0.3001968503937008,
2230
+ "grad_norm": 15.897427558898926,
2231
+ "learning_rate": 2.9821073558648115e-07,
2232
+ "loss": 2.7168,
2233
+ "step": 305
2234
+ }
2235
+ ],
2236
+ "logging_steps": 1,
2237
+ "max_steps": 3048,
2238
+ "num_input_tokens_seen": 0,
2239
+ "num_train_epochs": 3,
2240
+ "save_steps": 305,
2241
+ "stateful_callbacks": {
2242
+ "TrainerControl": {
2243
+ "args": {
2244
+ "should_epoch_stop": false,
2245
+ "should_evaluate": false,
2246
+ "should_log": false,
2247
+ "should_save": true,
2248
+ "should_training_stop": false
2249
+ },
2250
+ "attributes": {}
2251
+ }
2252
+ },
2253
+ "total_flos": 0.0,
2254
+ "train_batch_size": 32,
2255
+ "trial_name": null,
2256
+ "trial_params": null
2257
+ }
checkpoint-305/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe192eddc8689ae1ce8dcf2a4147788b9fb7727c69520abe4a56bc54e5c970fa
3
+ size 5688