NickyNicky commited on
Commit
360d8a0
·
verified ·
1 Parent(s): e372390

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,809 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: Item 8 in IBM's 2023 Annual Report to Stockholders details the
35
+ Financial Statements and Supplementary Data, which are included on pages 44 through
36
+ 121.
37
+ sentences:
38
+ - What was the amount gained from the disposal of assets in 2022?
39
+ - What section of IBM's Annual Report for 2023 contains the Financial Statements
40
+ and Supplementary Data?
41
+ - What were the cash outflows for capital expenditures in 2023 and 2022 respectively?
42
+ - source_sentence: For the fiscal year ended March 31, 2023, Electronic Arts reported
43
+ a gross margin of 75.9 percent, an increase of 2.5 percentage points from the
44
+ previous year.
45
+ sentences:
46
+ - How did investment banking revenues at Goldman Sachs change in 2023 compared to
47
+ 2022, and what factors contributed to this change?
48
+ - What was the gross margin percentage for Electronic Arts in the fiscal year ending
49
+ March 31, 2023?
50
+ - What were the risk-free interest rates for the fiscal years 2021, 2022, and 2023?
51
+ - source_sentence: Cash, cash equivalents, and restricted cash at the beginning of
52
+ the period totaled $7,013 for a company.
53
+ sentences:
54
+ - What was the amount of cash, cash equivalents, and restricted cash at the beginning
55
+ of the period for the company?
56
+ - What is the impact of the new $1.25 price point on Dollar Tree’s sales units and
57
+ profitability?
58
+ - What was the total amount attributed to Goodwill in the acquisition of Nuance
59
+ Communications, Inc. as reported by the company?
60
+ - source_sentence: generate our mall revenue primarily from leases with tenants through
61
+ base minimum rents, overage rents and reimbursements for common area maintenance
62
+ (CAM) and other expenditures.
63
+ sentences:
64
+ - How does Visa facilitate financial inclusion with their prepaid cards?
65
+ - What are the main objectives of the economic sanctions imposed by the United States
66
+ and other international bodies?
67
+ - What revenue sources does Shoppes at Venetian primarily rely on from its tenants?
68
+ - source_sentence: For the fiscal year ended August 26, 2023, we reported net sales
69
+ of $17.5 billion compared with $16.3 billion for the year ended August 27, 2022,
70
+ a 7.4% increase from fiscal 2022. This growth was driven primarily by a domestic
71
+ same store sales increase of 3.4% and net sales of $327.8 million from new domestic
72
+ and international stores.
73
+ sentences:
74
+ - What drove the 7.4% increase in AutoZone's net sales for fiscal 2023 compared
75
+ to fiscal 2022?
76
+ - What percentage of HP's external U.S. hires in fiscal year 2023 were racially
77
+ or ethnically diverse?
78
+ - How much did GameStop Corp's valuation allowances increase during fiscal 2022?
79
+ model-index:
80
+ - name: BGE base Financial Matryoshka
81
+ results:
82
+ - task:
83
+ type: information-retrieval
84
+ name: Information Retrieval
85
+ dataset:
86
+ name: dim 768
87
+ type: dim_768
88
+ metrics:
89
+ - type: cosine_accuracy@1
90
+ value: 0.6985714285714286
91
+ name: Cosine Accuracy@1
92
+ - type: cosine_accuracy@3
93
+ value: 0.8271428571428572
94
+ name: Cosine Accuracy@3
95
+ - type: cosine_accuracy@5
96
+ value: 0.8628571428571429
97
+ name: Cosine Accuracy@5
98
+ - type: cosine_accuracy@10
99
+ value: 0.8985714285714286
100
+ name: Cosine Accuracy@10
101
+ - type: cosine_precision@1
102
+ value: 0.6985714285714286
103
+ name: Cosine Precision@1
104
+ - type: cosine_precision@3
105
+ value: 0.2757142857142857
106
+ name: Cosine Precision@3
107
+ - type: cosine_precision@5
108
+ value: 0.17257142857142854
109
+ name: Cosine Precision@5
110
+ - type: cosine_precision@10
111
+ value: 0.08985714285714284
112
+ name: Cosine Precision@10
113
+ - type: cosine_recall@1
114
+ value: 0.6985714285714286
115
+ name: Cosine Recall@1
116
+ - type: cosine_recall@3
117
+ value: 0.8271428571428572
118
+ name: Cosine Recall@3
119
+ - type: cosine_recall@5
120
+ value: 0.8628571428571429
121
+ name: Cosine Recall@5
122
+ - type: cosine_recall@10
123
+ value: 0.8985714285714286
124
+ name: Cosine Recall@10
125
+ - type: cosine_ndcg@10
126
+ value: 0.8023663256793517
127
+ name: Cosine Ndcg@10
128
+ - type: cosine_mrr@10
129
+ value: 0.7712675736961451
130
+ name: Cosine Mrr@10
131
+ - type: cosine_map@100
132
+ value: 0.7758522351159084
133
+ name: Cosine Map@100
134
+ - task:
135
+ type: information-retrieval
136
+ name: Information Retrieval
137
+ dataset:
138
+ name: dim 512
139
+ type: dim_512
140
+ metrics:
141
+ - type: cosine_accuracy@1
142
+ value: 0.69
143
+ name: Cosine Accuracy@1
144
+ - type: cosine_accuracy@3
145
+ value: 0.8271428571428572
146
+ name: Cosine Accuracy@3
147
+ - type: cosine_accuracy@5
148
+ value: 0.86
149
+ name: Cosine Accuracy@5
150
+ - type: cosine_accuracy@10
151
+ value: 0.9028571428571428
152
+ name: Cosine Accuracy@10
153
+ - type: cosine_precision@1
154
+ value: 0.69
155
+ name: Cosine Precision@1
156
+ - type: cosine_precision@3
157
+ value: 0.2757142857142857
158
+ name: Cosine Precision@3
159
+ - type: cosine_precision@5
160
+ value: 0.17199999999999996
161
+ name: Cosine Precision@5
162
+ - type: cosine_precision@10
163
+ value: 0.09028571428571427
164
+ name: Cosine Precision@10
165
+ - type: cosine_recall@1
166
+ value: 0.69
167
+ name: Cosine Recall@1
168
+ - type: cosine_recall@3
169
+ value: 0.8271428571428572
170
+ name: Cosine Recall@3
171
+ - type: cosine_recall@5
172
+ value: 0.86
173
+ name: Cosine Recall@5
174
+ - type: cosine_recall@10
175
+ value: 0.9028571428571428
176
+ name: Cosine Recall@10
177
+ - type: cosine_ndcg@10
178
+ value: 0.7998655910794988
179
+ name: Cosine Ndcg@10
180
+ - type: cosine_mrr@10
181
+ value: 0.7665912698412698
182
+ name: Cosine Mrr@10
183
+ - type: cosine_map@100
184
+ value: 0.7706925401671437
185
+ name: Cosine Map@100
186
+ - task:
187
+ type: information-retrieval
188
+ name: Information Retrieval
189
+ dataset:
190
+ name: dim 256
191
+ type: dim_256
192
+ metrics:
193
+ - type: cosine_accuracy@1
194
+ value: 0.6957142857142857
195
+ name: Cosine Accuracy@1
196
+ - type: cosine_accuracy@3
197
+ value: 0.8228571428571428
198
+ name: Cosine Accuracy@3
199
+ - type: cosine_accuracy@5
200
+ value: 0.86
201
+ name: Cosine Accuracy@5
202
+ - type: cosine_accuracy@10
203
+ value: 0.8914285714285715
204
+ name: Cosine Accuracy@10
205
+ - type: cosine_precision@1
206
+ value: 0.6957142857142857
207
+ name: Cosine Precision@1
208
+ - type: cosine_precision@3
209
+ value: 0.2742857142857143
210
+ name: Cosine Precision@3
211
+ - type: cosine_precision@5
212
+ value: 0.17199999999999996
213
+ name: Cosine Precision@5
214
+ - type: cosine_precision@10
215
+ value: 0.08914285714285713
216
+ name: Cosine Precision@10
217
+ - type: cosine_recall@1
218
+ value: 0.6957142857142857
219
+ name: Cosine Recall@1
220
+ - type: cosine_recall@3
221
+ value: 0.8228571428571428
222
+ name: Cosine Recall@3
223
+ - type: cosine_recall@5
224
+ value: 0.86
225
+ name: Cosine Recall@5
226
+ - type: cosine_recall@10
227
+ value: 0.8914285714285715
228
+ name: Cosine Recall@10
229
+ - type: cosine_ndcg@10
230
+ value: 0.7974564108711016
231
+ name: Cosine Ndcg@10
232
+ - type: cosine_mrr@10
233
+ value: 0.7669535147392289
234
+ name: Cosine Mrr@10
235
+ - type: cosine_map@100
236
+ value: 0.7718155211819018
237
+ name: Cosine Map@100
238
+ - task:
239
+ type: information-retrieval
240
+ name: Information Retrieval
241
+ dataset:
242
+ name: dim 128
243
+ type: dim_128
244
+ metrics:
245
+ - type: cosine_accuracy@1
246
+ value: 0.6871428571428572
247
+ name: Cosine Accuracy@1
248
+ - type: cosine_accuracy@3
249
+ value: 0.8128571428571428
250
+ name: Cosine Accuracy@3
251
+ - type: cosine_accuracy@5
252
+ value: 0.8457142857142858
253
+ name: Cosine Accuracy@5
254
+ - type: cosine_accuracy@10
255
+ value: 0.8857142857142857
256
+ name: Cosine Accuracy@10
257
+ - type: cosine_precision@1
258
+ value: 0.6871428571428572
259
+ name: Cosine Precision@1
260
+ - type: cosine_precision@3
261
+ value: 0.27095238095238094
262
+ name: Cosine Precision@3
263
+ - type: cosine_precision@5
264
+ value: 0.16914285714285712
265
+ name: Cosine Precision@5
266
+ - type: cosine_precision@10
267
+ value: 0.08857142857142856
268
+ name: Cosine Precision@10
269
+ - type: cosine_recall@1
270
+ value: 0.6871428571428572
271
+ name: Cosine Recall@1
272
+ - type: cosine_recall@3
273
+ value: 0.8128571428571428
274
+ name: Cosine Recall@3
275
+ - type: cosine_recall@5
276
+ value: 0.8457142857142858
277
+ name: Cosine Recall@5
278
+ - type: cosine_recall@10
279
+ value: 0.8857142857142857
280
+ name: Cosine Recall@10
281
+ - type: cosine_ndcg@10
282
+ value: 0.787697533881839
283
+ name: Cosine Ndcg@10
284
+ - type: cosine_mrr@10
285
+ value: 0.756192743764172
286
+ name: Cosine Mrr@10
287
+ - type: cosine_map@100
288
+ value: 0.7610331995977764
289
+ name: Cosine Map@100
290
+ - task:
291
+ type: information-retrieval
292
+ name: Information Retrieval
293
+ dataset:
294
+ name: dim 64
295
+ type: dim_64
296
+ metrics:
297
+ - type: cosine_accuracy@1
298
+ value: 0.6328571428571429
299
+ name: Cosine Accuracy@1
300
+ - type: cosine_accuracy@3
301
+ value: 0.7771428571428571
302
+ name: Cosine Accuracy@3
303
+ - type: cosine_accuracy@5
304
+ value: 0.8171428571428572
305
+ name: Cosine Accuracy@5
306
+ - type: cosine_accuracy@10
307
+ value: 0.8571428571428571
308
+ name: Cosine Accuracy@10
309
+ - type: cosine_precision@1
310
+ value: 0.6328571428571429
311
+ name: Cosine Precision@1
312
+ - type: cosine_precision@3
313
+ value: 0.259047619047619
314
+ name: Cosine Precision@3
315
+ - type: cosine_precision@5
316
+ value: 0.16342857142857142
317
+ name: Cosine Precision@5
318
+ - type: cosine_precision@10
319
+ value: 0.08571428571428569
320
+ name: Cosine Precision@10
321
+ - type: cosine_recall@1
322
+ value: 0.6328571428571429
323
+ name: Cosine Recall@1
324
+ - type: cosine_recall@3
325
+ value: 0.7771428571428571
326
+ name: Cosine Recall@3
327
+ - type: cosine_recall@5
328
+ value: 0.8171428571428572
329
+ name: Cosine Recall@5
330
+ - type: cosine_recall@10
331
+ value: 0.8571428571428571
332
+ name: Cosine Recall@10
333
+ - type: cosine_ndcg@10
334
+ value: 0.7482728321357093
335
+ name: Cosine Ndcg@10
336
+ - type: cosine_mrr@10
337
+ value: 0.7131224489795914
338
+ name: Cosine Mrr@10
339
+ - type: cosine_map@100
340
+ value: 0.7189753431460272
341
+ name: Cosine Map@100
342
+ ---
343
+
344
+ # BGE base Financial Matryoshka
345
+
346
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
347
+
348
+ ## Model Details
349
+
350
+ ### Model Description
351
+ - **Model Type:** Sentence Transformer
352
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
353
+ - **Maximum Sequence Length:** 512 tokens
354
+ - **Output Dimensionality:** 768 tokens
355
+ - **Similarity Function:** Cosine Similarity
356
+ <!-- - **Training Dataset:** Unknown -->
357
+ - **Language:** en
358
+ - **License:** apache-2.0
359
+
360
+ ### Model Sources
361
+
362
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
363
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
364
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
365
+
366
+ ### Full Model Architecture
367
+
368
+ ```
369
+ SentenceTransformer(
370
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
371
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
372
+ (2): Normalize()
373
+ )
374
+ ```
375
+
376
+ ## Usage
377
+
378
+ ### Direct Usage (Sentence Transformers)
379
+
380
+ First install the Sentence Transformers library:
381
+
382
+ ```bash
383
+ pip install -U sentence-transformers
384
+ ```
385
+
386
+ Then you can load this model and run inference.
387
+ ```python
388
+ from sentence_transformers import SentenceTransformer
389
+
390
+ # Download from the 🤗 Hub
391
+ model = SentenceTransformer("NickyNicky/bge-base-financial-matryoshka")
392
+ # Run inference
393
+ sentences = [
394
+ 'For the fiscal year ended August 26, 2023, we reported net sales of $17.5 billion compared with $16.3 billion for the year ended August 27, 2022, a 7.4% increase from fiscal 2022. This growth was driven primarily by a domestic same store sales increase of 3.4% and net sales of $327.8 million from new domestic and international stores.',
395
+ "What drove the 7.4% increase in AutoZone's net sales for fiscal 2023 compared to fiscal 2022?",
396
+ "What percentage of HP's external U.S. hires in fiscal year 2023 were racially or ethnically diverse?",
397
+ ]
398
+ embeddings = model.encode(sentences)
399
+ print(embeddings.shape)
400
+ # [3, 768]
401
+
402
+ # Get the similarity scores for the embeddings
403
+ similarities = model.similarity(embeddings, embeddings)
404
+ print(similarities.shape)
405
+ # [3, 3]
406
+ ```
407
+
408
+ <!--
409
+ ### Direct Usage (Transformers)
410
+
411
+ <details><summary>Click to see the direct usage in Transformers</summary>
412
+
413
+ </details>
414
+ -->
415
+
416
+ <!--
417
+ ### Downstream Usage (Sentence Transformers)
418
+
419
+ You can finetune this model on your own dataset.
420
+
421
+ <details><summary>Click to expand</summary>
422
+
423
+ </details>
424
+ -->
425
+
426
+ <!--
427
+ ### Out-of-Scope Use
428
+
429
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
430
+ -->
431
+
432
+ ## Evaluation
433
+
434
+ ### Metrics
435
+
436
+ #### Information Retrieval
437
+ * Dataset: `dim_768`
438
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
439
+
440
+ | Metric | Value |
441
+ |:--------------------|:-----------|
442
+ | cosine_accuracy@1 | 0.6986 |
443
+ | cosine_accuracy@3 | 0.8271 |
444
+ | cosine_accuracy@5 | 0.8629 |
445
+ | cosine_accuracy@10 | 0.8986 |
446
+ | cosine_precision@1 | 0.6986 |
447
+ | cosine_precision@3 | 0.2757 |
448
+ | cosine_precision@5 | 0.1726 |
449
+ | cosine_precision@10 | 0.0899 |
450
+ | cosine_recall@1 | 0.6986 |
451
+ | cosine_recall@3 | 0.8271 |
452
+ | cosine_recall@5 | 0.8629 |
453
+ | cosine_recall@10 | 0.8986 |
454
+ | cosine_ndcg@10 | 0.8024 |
455
+ | cosine_mrr@10 | 0.7713 |
456
+ | **cosine_map@100** | **0.7759** |
457
+
458
+ #### Information Retrieval
459
+ * Dataset: `dim_512`
460
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
461
+
462
+ | Metric | Value |
463
+ |:--------------------|:-----------|
464
+ | cosine_accuracy@1 | 0.69 |
465
+ | cosine_accuracy@3 | 0.8271 |
466
+ | cosine_accuracy@5 | 0.86 |
467
+ | cosine_accuracy@10 | 0.9029 |
468
+ | cosine_precision@1 | 0.69 |
469
+ | cosine_precision@3 | 0.2757 |
470
+ | cosine_precision@5 | 0.172 |
471
+ | cosine_precision@10 | 0.0903 |
472
+ | cosine_recall@1 | 0.69 |
473
+ | cosine_recall@3 | 0.8271 |
474
+ | cosine_recall@5 | 0.86 |
475
+ | cosine_recall@10 | 0.9029 |
476
+ | cosine_ndcg@10 | 0.7999 |
477
+ | cosine_mrr@10 | 0.7666 |
478
+ | **cosine_map@100** | **0.7707** |
479
+
480
+ #### Information Retrieval
481
+ * Dataset: `dim_256`
482
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
483
+
484
+ | Metric | Value |
485
+ |:--------------------|:-----------|
486
+ | cosine_accuracy@1 | 0.6957 |
487
+ | cosine_accuracy@3 | 0.8229 |
488
+ | cosine_accuracy@5 | 0.86 |
489
+ | cosine_accuracy@10 | 0.8914 |
490
+ | cosine_precision@1 | 0.6957 |
491
+ | cosine_precision@3 | 0.2743 |
492
+ | cosine_precision@5 | 0.172 |
493
+ | cosine_precision@10 | 0.0891 |
494
+ | cosine_recall@1 | 0.6957 |
495
+ | cosine_recall@3 | 0.8229 |
496
+ | cosine_recall@5 | 0.86 |
497
+ | cosine_recall@10 | 0.8914 |
498
+ | cosine_ndcg@10 | 0.7975 |
499
+ | cosine_mrr@10 | 0.767 |
500
+ | **cosine_map@100** | **0.7718** |
501
+
502
+ #### Information Retrieval
503
+ * Dataset: `dim_128`
504
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
505
+
506
+ | Metric | Value |
507
+ |:--------------------|:----------|
508
+ | cosine_accuracy@1 | 0.6871 |
509
+ | cosine_accuracy@3 | 0.8129 |
510
+ | cosine_accuracy@5 | 0.8457 |
511
+ | cosine_accuracy@10 | 0.8857 |
512
+ | cosine_precision@1 | 0.6871 |
513
+ | cosine_precision@3 | 0.271 |
514
+ | cosine_precision@5 | 0.1691 |
515
+ | cosine_precision@10 | 0.0886 |
516
+ | cosine_recall@1 | 0.6871 |
517
+ | cosine_recall@3 | 0.8129 |
518
+ | cosine_recall@5 | 0.8457 |
519
+ | cosine_recall@10 | 0.8857 |
520
+ | cosine_ndcg@10 | 0.7877 |
521
+ | cosine_mrr@10 | 0.7562 |
522
+ | **cosine_map@100** | **0.761** |
523
+
524
+ #### Information Retrieval
525
+ * Dataset: `dim_64`
526
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
527
+
528
+ | Metric | Value |
529
+ |:--------------------|:----------|
530
+ | cosine_accuracy@1 | 0.6329 |
531
+ | cosine_accuracy@3 | 0.7771 |
532
+ | cosine_accuracy@5 | 0.8171 |
533
+ | cosine_accuracy@10 | 0.8571 |
534
+ | cosine_precision@1 | 0.6329 |
535
+ | cosine_precision@3 | 0.259 |
536
+ | cosine_precision@5 | 0.1634 |
537
+ | cosine_precision@10 | 0.0857 |
538
+ | cosine_recall@1 | 0.6329 |
539
+ | cosine_recall@3 | 0.7771 |
540
+ | cosine_recall@5 | 0.8171 |
541
+ | cosine_recall@10 | 0.8571 |
542
+ | cosine_ndcg@10 | 0.7483 |
543
+ | cosine_mrr@10 | 0.7131 |
544
+ | **cosine_map@100** | **0.719** |
545
+
546
+ <!--
547
+ ## Bias, Risks and Limitations
548
+
549
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
550
+ -->
551
+
552
+ <!--
553
+ ### Recommendations
554
+
555
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
556
+ -->
557
+
558
+ ## Training Details
559
+
560
+ ### Training Dataset
561
+
562
+ #### Unnamed Dataset
563
+
564
+
565
+ * Size: 6,300 training samples
566
+ * Columns: <code>positive</code> and <code>anchor</code>
567
+ * Approximate statistics based on the first 1000 samples:
568
+ | | positive | anchor |
569
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
570
+ | type | string | string |
571
+ | details | <ul><li>min: 2 tokens</li><li>mean: 46.19 tokens</li><li>max: 371 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 20.39 tokens</li><li>max: 46 tokens</li></ul> |
572
+ * Samples:
573
+ | positive | anchor |
574
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------|
575
+ | <code>Cash used in financing activities in fiscal 2022 was primarily attributable to settlement of stock-based awards.</code> | <code>Why was there a net outflow of cash in financing activities in fiscal 2022?</code> |
576
+ | <code>Certain vendors have been impacted by volatility in the supply chain financing market.</code> | <code>How have certain vendors been impacted in the supply chain financing market?</code> |
577
+ | <code>In the consolidated financial statements for Visa, the net cash provided by operating activities amounted to 20,755 units in the most recent period, 18,849 units in the previous period, and 15,227 units in the period before that.</code> | <code>How much net cash did Visa's operating activities generate in the most recent period according to the financial statements?</code> |
578
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
579
+ ```json
580
+ {
581
+ "loss": "MultipleNegativesRankingLoss",
582
+ "matryoshka_dims": [
583
+ 768,
584
+ 512,
585
+ 256,
586
+ 128,
587
+ 64
588
+ ],
589
+ "matryoshka_weights": [
590
+ 1,
591
+ 1,
592
+ 1,
593
+ 1,
594
+ 1
595
+ ],
596
+ "n_dims_per_step": -1
597
+ }
598
+ ```
599
+
600
+ ### Training Hyperparameters
601
+ #### Non-Default Hyperparameters
602
+
603
+ - `eval_strategy`: epoch
604
+ - `per_device_train_batch_size`: 32
605
+ - `per_device_eval_batch_size`: 16
606
+ - `gradient_accumulation_steps`: 16
607
+ - `learning_rate`: 2e-05
608
+ - `num_train_epochs`: 4
609
+ - `lr_scheduler_type`: cosine
610
+ - `warmup_ratio`: 0.1
611
+ - `bf16`: True
612
+ - `tf32`: True
613
+ - `optim`: adamw_torch_fused
614
+ - `batch_sampler`: no_duplicates
615
+
616
+ #### All Hyperparameters
617
+ <details><summary>Click to expand</summary>
618
+
619
+ - `overwrite_output_dir`: False
620
+ - `do_predict`: False
621
+ - `eval_strategy`: epoch
622
+ - `prediction_loss_only`: True
623
+ - `per_device_train_batch_size`: 32
624
+ - `per_device_eval_batch_size`: 16
625
+ - `per_gpu_train_batch_size`: None
626
+ - `per_gpu_eval_batch_size`: None
627
+ - `gradient_accumulation_steps`: 16
628
+ - `eval_accumulation_steps`: None
629
+ - `learning_rate`: 2e-05
630
+ - `weight_decay`: 0.0
631
+ - `adam_beta1`: 0.9
632
+ - `adam_beta2`: 0.999
633
+ - `adam_epsilon`: 1e-08
634
+ - `max_grad_norm`: 1.0
635
+ - `num_train_epochs`: 4
636
+ - `max_steps`: -1
637
+ - `lr_scheduler_type`: cosine
638
+ - `lr_scheduler_kwargs`: {}
639
+ - `warmup_ratio`: 0.1
640
+ - `warmup_steps`: 0
641
+ - `log_level`: passive
642
+ - `log_level_replica`: warning
643
+ - `log_on_each_node`: True
644
+ - `logging_nan_inf_filter`: True
645
+ - `save_safetensors`: True
646
+ - `save_on_each_node`: False
647
+ - `save_only_model`: False
648
+ - `restore_callback_states_from_checkpoint`: False
649
+ - `no_cuda`: False
650
+ - `use_cpu`: False
651
+ - `use_mps_device`: False
652
+ - `seed`: 42
653
+ - `data_seed`: None
654
+ - `jit_mode_eval`: False
655
+ - `use_ipex`: False
656
+ - `bf16`: True
657
+ - `fp16`: False
658
+ - `fp16_opt_level`: O1
659
+ - `half_precision_backend`: auto
660
+ - `bf16_full_eval`: False
661
+ - `fp16_full_eval`: False
662
+ - `tf32`: True
663
+ - `local_rank`: 0
664
+ - `ddp_backend`: None
665
+ - `tpu_num_cores`: None
666
+ - `tpu_metrics_debug`: False
667
+ - `debug`: []
668
+ - `dataloader_drop_last`: False
669
+ - `dataloader_num_workers`: 0
670
+ - `dataloader_prefetch_factor`: None
671
+ - `past_index`: -1
672
+ - `disable_tqdm`: False
673
+ - `remove_unused_columns`: True
674
+ - `label_names`: None
675
+ - `load_best_model_at_end`: False
676
+ - `ignore_data_skip`: False
677
+ - `fsdp`: []
678
+ - `fsdp_min_num_params`: 0
679
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
680
+ - `fsdp_transformer_layer_cls_to_wrap`: None
681
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
682
+ - `deepspeed`: None
683
+ - `label_smoothing_factor`: 0.0
684
+ - `optim`: adamw_torch_fused
685
+ - `optim_args`: None
686
+ - `adafactor`: False
687
+ - `group_by_length`: False
688
+ - `length_column_name`: length
689
+ - `ddp_find_unused_parameters`: None
690
+ - `ddp_bucket_cap_mb`: None
691
+ - `ddp_broadcast_buffers`: False
692
+ - `dataloader_pin_memory`: True
693
+ - `dataloader_persistent_workers`: False
694
+ - `skip_memory_metrics`: True
695
+ - `use_legacy_prediction_loop`: False
696
+ - `push_to_hub`: False
697
+ - `resume_from_checkpoint`: None
698
+ - `hub_model_id`: None
699
+ - `hub_strategy`: every_save
700
+ - `hub_private_repo`: False
701
+ - `hub_always_push`: False
702
+ - `gradient_checkpointing`: False
703
+ - `gradient_checkpointing_kwargs`: None
704
+ - `include_inputs_for_metrics`: False
705
+ - `eval_do_concat_batches`: True
706
+ - `fp16_backend`: auto
707
+ - `push_to_hub_model_id`: None
708
+ - `push_to_hub_organization`: None
709
+ - `mp_parameters`:
710
+ - `auto_find_batch_size`: False
711
+ - `full_determinism`: False
712
+ - `torchdynamo`: None
713
+ - `ray_scope`: last
714
+ - `ddp_timeout`: 1800
715
+ - `torch_compile`: False
716
+ - `torch_compile_backend`: None
717
+ - `torch_compile_mode`: None
718
+ - `dispatch_batches`: None
719
+ - `split_batches`: None
720
+ - `include_tokens_per_second`: False
721
+ - `include_num_input_tokens_seen`: False
722
+ - `neftune_noise_alpha`: None
723
+ - `optim_target_modules`: None
724
+ - `batch_eval_metrics`: False
725
+ - `batch_sampler`: no_duplicates
726
+ - `multi_dataset_batch_sampler`: proportional
727
+
728
+ </details>
729
+
730
+ ### Training Logs
731
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
732
+ |:------:|:----:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
733
+ | 0.8122 | 10 | 1.5643 | - | - | - | - | - |
734
+ | 0.9746 | 12 | - | 0.7349 | 0.7494 | 0.7524 | 0.6987 | 0.7569 |
735
+ | 1.6244 | 20 | 0.6756 | - | - | - | - | - |
736
+ | 1.9492 | 24 | - | 0.7555 | 0.7659 | 0.7683 | 0.7190 | 0.7700 |
737
+ | 2.4365 | 30 | 0.4561 | - | - | - | - | - |
738
+ | 2.9239 | 36 | - | 0.7592 | 0.7698 | 0.7698 | 0.7184 | 0.7741 |
739
+ | 3.2487 | 40 | 0.3645 | - | - | - | - | - |
740
+ | 3.8985 | 48 | - | 0.7610 | 0.7718 | 0.7707 | 0.7190 | 0.7759 |
741
+
742
+
743
+ ### Framework Versions
744
+ - Python: 3.10.12
745
+ - Sentence Transformers: 3.0.1
746
+ - Transformers: 4.41.2
747
+ - PyTorch: 2.2.0+cu121
748
+ - Accelerate: 0.31.0
749
+ - Datasets: 2.19.1
750
+ - Tokenizers: 0.19.1
751
+
752
+ ## Citation
753
+
754
+ ### BibTeX
755
+
756
+ #### Sentence Transformers
757
+ ```bibtex
758
+ @inproceedings{reimers-2019-sentence-bert,
759
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
760
+ author = "Reimers, Nils and Gurevych, Iryna",
761
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
762
+ month = "11",
763
+ year = "2019",
764
+ publisher = "Association for Computational Linguistics",
765
+ url = "https://arxiv.org/abs/1908.10084",
766
+ }
767
+ ```
768
+
769
+ #### MatryoshkaLoss
770
+ ```bibtex
771
+ @misc{kusupati2024matryoshka,
772
+ title={Matryoshka Representation Learning},
773
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
774
+ year={2024},
775
+ eprint={2205.13147},
776
+ archivePrefix={arXiv},
777
+ primaryClass={cs.LG}
778
+ }
779
+ ```
780
+
781
+ #### MultipleNegativesRankingLoss
782
+ ```bibtex
783
+ @misc{henderson2017efficient,
784
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
785
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
786
+ year={2017},
787
+ eprint={1705.00652},
788
+ archivePrefix={arXiv},
789
+ primaryClass={cs.CL}
790
+ }
791
+ ```
792
+
793
+ <!--
794
+ ## Glossary
795
+
796
+ *Clearly define terms in order to be accessible across audiences.*
797
+ -->
798
+
799
+ <!--
800
+ ## Model Card Authors
801
+
802
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
803
+ -->
804
+
805
+ <!--
806
+ ## Model Card Contact
807
+
808
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
809
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18adc92675356d350ed51b2bb43686de3c95e72c6bf852ea3f443793dddb7315
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff