jebish7 commited on
Commit
b046225
1 Parent(s): 8f46dff

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 1024, "out_features": 1024, "bias": true, "activation_function": "torch.nn.modules.linear.Identity"}
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0e351c268a2988ff67cf25d603f95963a0b3b2ae58c2a3ba4f1c9a7489f458f
3
+ size 4198560
README.md ADDED
@@ -0,0 +1,532 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:29545
8
+ - loss:MultipleNegativesSymmetricRankingLoss
9
+ base_model: dunzhang/stella_en_400M_v5
10
+ widget:
11
+ - source_sentence: In the context of the risk-based assessment of customers and business
12
+ relationships, how should the overlap between customer risk assessment and CDD
13
+ be managed to ensure both are completed effectively and in compliance with ADGM
14
+ regulations?
15
+ sentences:
16
+ - 'DocumentID: 36 | PassageID: D.7. | Passage: Principle 7 – Scenario analysis of
17
+ climate-related financial risks. Where appropriate, relevant financial firms should
18
+ develop and implement climate-related scenario analysis frameworks, including
19
+ stress testing, in a manner commensurate with their size, complexity, risk profile
20
+ and nature of activities.
21
+
22
+ '
23
+ - 'DocumentID: 1 | PassageID: 7.Guidance.4. | Passage: The risk-based assessment
24
+ of the customer and the proposed business relationship, Transaction or product
25
+ required under this Chapter is required to be undertaken prior to the establishment
26
+ of a business relationship with a customer. Because the risk rating assigned to
27
+ a customer resulting from this assessment determines the level of CDD that must
28
+ be undertaken for that customer, this process must be completed before the CDD
29
+ is completed for the customer. The Regulator is aware that in practice there will
30
+ often be some degree of overlap between the customer risk assessment and CDD.
31
+ For example, a Relevant Person may undertake some aspects of CDD, such as identifying
32
+ Beneficial Owners, when it performs a risk assessment of the customer. Conversely,
33
+ a Relevant Person may also obtain relevant information as part of CDD which has
34
+ an impact on its customer risk assessment. Where information obtained as part
35
+ of CDD of a customer affects the risk rating of a customer, the change in risk
36
+ rating should be reflected in the degree of CDD undertaken.'
37
+ - 'DocumentID: 1 | PassageID: 9.1.2.Guidance.4. | Passage: Where the legislative
38
+ framework of a jurisdiction (such as secrecy or data protection legislation) prevents
39
+ a Relevant Person from having access to CDD information upon request without delay
40
+ as referred to in Rule ‎9.1.1(3)(b), the Relevant Person should undertake the
41
+ relevant CDD itself and should not seek to rely on the relevant third party.'
42
+ - source_sentence: Can you clarify the responsibilities of the Governing Body of a
43
+ Relevant Person in establishing and maintaining AML/TFS policies and procedures,
44
+ and how these should be documented and reviewed?
45
+ sentences:
46
+ - 'DocumentID: 28 | PassageID: 193) | Passage: SUPERVISION BY LISTING AUTHORITY
47
+
48
+ Complaints or allegations of non-compliance by Reporting Entities
49
+
50
+ If, as a result of the enquiry, the Listing Authority forms the view that the
51
+ information is accurate, is Inside Information, and is not within exemption from
52
+ Disclosure provided by Rule 7.2.2, the Listing Authority will ask the Reporting
53
+ Entity to make a Disclosure about the matter under Rule 7.2.1. If the information
54
+ should have been Disclosed earlier, the Listing Authority may issue an ‘aware
55
+ letter’ (see paragraphs 187 to 189 above), or take other relevant action.
56
+
57
+
58
+ '
59
+ - "DocumentID: 17 | PassageID: Part 13.165.(2) | Passage: The Regulator shall not\
60
+ \ approve a Non Abu Dhabi Global Market Clearing House unless it is satisfied—\n\
61
+ (a)\tthat the rules and practices of the body, together with the law of the country\
62
+ \ in which the body's head office is situated, provide adequate procedures for\
63
+ \ dealing with the default of persons party to contracts connected with the body;\
64
+ \ and\n(b)\tthat it is otherwise appropriate to approve the body;\ntogether being\
65
+ \ the “Relevant Requirements” for this Part."
66
+ - "DocumentID: 1 | PassageID: 4.3.1 | Passage: A Relevant Person which is part of\
67
+ \ a Group must ensure that it:\n(a)\thas developed and implemented policies and\
68
+ \ procedures for the sharing of information between Group entities, including\
69
+ \ the sharing of information relating to CDD and money laundering risks;\n(b)\t\
70
+ has in place adequate safeguards on the confidentiality and use of information\
71
+ \ exchanged between Group entities, including consideration of relevant data protection\
72
+ \ legislation;\n(c)\tremains aware of the money laundering risks of the Group\
73
+ \ as a whole and of its exposure to the Group and takes active steps to mitigate\
74
+ \ such risks;\n(d)\tcontributes to a Group-wide risk assessment to identify and\
75
+ \ assess money laundering risks for the Group; and\n(e)\tprovides its Group-wide\
76
+ \ compliance, audit and AML/TFS functions with customer account and Transaction\
77
+ \ information from its Branches and Subsidiaries when necessary for AML/TFS purposes."
78
+ - source_sentence: What specific accounting standards and practices are we required
79
+ to follow when valuing positions in our Trading and Non-Trading Books to ensure
80
+ compliance with ADGM regulations?
81
+ sentences:
82
+ - 'DocumentID: 7 | PassageID: 8.10.1.(2).Guidance.3. | Passage: Each Authorised
83
+ Person, Recognised Body and its Auditors is also required under Part 16 and section
84
+ 193 of the FSMR respectively, to disclose to the Regulator any matter which may
85
+ indicate a breach or likely breach of, or a failure or likely failure to comply
86
+ with, Regulations or Rules. Each Authorised Person and Recognised Body is also
87
+ required to establish and implement systems and procedures to enable its compliance
88
+ and compliance by its Auditors with notification requirements.
89
+
90
+ '
91
+ - "DocumentID: 18 | PassageID: 3.2 | Passage: Financial Services Permissions. VC\
92
+ \ Managers operating in ADGM require a Financial Services Permission (“FSP”) to\
93
+ \ undertake any Regulated Activity pertaining to VC Funds and/or co-investments\
94
+ \ by third parties in VC Funds. The Regulated Activities covered by the FSP will\
95
+ \ be dependent on the VC Managers’ investment strategy and business model.\n(a)\t\
96
+ Managing a Collective Investment Fund: this includes carrying out fund management\
97
+ \ activities in respect of a VC Fund.\n(b)\tAdvising on Investments or Credit\
98
+ \ : for VC Managers these activities will be restricted to activities related\
99
+ \ to co-investment alongside a VC Fund which the VC Manager manages, such as recommending\
100
+ \ that a client invest in an investee company alongside the VC Fund and on the\
101
+ \ strategy and structure required to make the investment.\n(c)\tArranging Deals\
102
+ \ in Investments: VC Managers may also wish to make arrangements to facilitate\
103
+ \ co-investments in the investee company.\nAuthorisation fees and supervision\
104
+ \ fees for a VC Manager are capped at USD 10,000 regardless of whether one or\
105
+ \ both of the additional Regulated Activities in b) and c) above in relation to\
106
+ \ co-investments are included in its FSP. The FSP will include restrictions appropriate\
107
+ \ to the business model of a VC Manager."
108
+ - 'DocumentID: 13 | PassageID: APP2.A2.1.1.(4) | Passage: An Authorised Person must
109
+ value every position included in its Trading Book and the Non Trading Book in
110
+ accordance with the relevant accounting standards and practices.
111
+
112
+ '
113
+ - source_sentence: What documentation and information are we required to maintain
114
+ to demonstrate compliance with the rules pertaining to the cooperation with auditors,
115
+ especially in terms of providing access and not interfering with their duties?
116
+ sentences:
117
+ - "DocumentID: 6 | PassageID: PART 5.16.3.5 | Passage: Co-operation with auditors.\
118
+ \ A Fund Manager must take reasonable steps to ensure that it and its Employees:\n\
119
+ (a)\tprovide any information to its auditor that its auditor reasonably requires,\
120
+ \ or is entitled to receive as auditor;\n(b)\tgive the auditor right of access\
121
+ \ at all reasonable times to relevant records and information within its possession;\n\
122
+ (c)\tallow the auditor to make copies of any records or information referred to\
123
+ \ in ‎(b);\n(d)\tdo not interfere with the auditor's ability to discharge its\
124
+ \ duties;\n(e)\treport to the auditor any matter which may significantly affect\
125
+ \ the financial position of the Fund; and\n(f)\tprovide such other assistance\
126
+ \ as the auditor may reasonably request it to provide."
127
+ - "DocumentID: 13 | PassageID: 4.3.1 | Passage: An Authorised Person must implement\
128
+ \ and maintain comprehensive Credit Risk management systems which:\n(a)\tare appropriate\
129
+ \ to the firm's type, scope, complexity and scale of operations;\n(b)\tare appropriate\
130
+ \ to the diversity of its operations, including geographical diversity;\n(c)\t\
131
+ enable the firm to effectively identify, assess, monitor and control Credit Risk\
132
+ \ and to ensure that adequate Capital Resources are available at all times to\
133
+ \ cover the risks assumed; and\n(d)\tensure effective implementation of the Credit\
134
+ \ Risk strategy and policy."
135
+ - 'DocumentID: 3 | PassageID: 3.8.9 | Passage: The Authorised Person acting as the
136
+ Investment Manager of an ADGM Green Portfolio must provide a copy of the attestation
137
+ obtained for the purposes of Rule ‎3.8.6 to each Client with whom it has entered
138
+ into a Discretionary Portfolio Management Agreement in respect of such ADGM Green
139
+ Portfolio at least on an annual basis and upon request by the Client.'
140
+ - source_sentence: Could you provide examples of circumstances that, when changed,
141
+ would necessitate the reevaluation of a customer's risk assessment and the application
142
+ of updated CDD measures?
143
+ sentences:
144
+ - 'DocumentID: 13 | PassageID: 9.2.1.Guidance.1. | Passage: The Regulator expects
145
+ that an Authorised Person''s Liquidity Risk strategy will set out the approach
146
+ that the Authorised Person will take to Liquidity Risk management, including various
147
+ quantitative and qualitative targets. It should be communicated to all relevant
148
+ functions and staff within the organisation and be set out in the Authorised Person''s
149
+ Liquidity Risk policy.'
150
+ - "DocumentID: 1 | PassageID: 8.1.2.(1) | Passage: A Relevant Person must also apply\
151
+ \ CDD measures to each existing customer under Rules ‎8.3.1, ‎8.4.1 or ‎8.5.1\
152
+ \ as applicable:\n(a)\twith a frequency appropriate to the outcome of the risk-based\
153
+ \ approach taken in relation to each customer; and\n(b)\twhen the Relevant Person\
154
+ \ becomes aware that any circumstances relevant to its risk assessment for a customer\
155
+ \ have changed."
156
+ - "DocumentID: 1 | PassageID: 8.1.1.Guidance.2. | Passage: The FIU has issued guides\
157
+ \ that require:\n(a)\ta DNFBP that is a dealer in precious metals or precious\
158
+ \ stones to obtain relevant identification documents, such as passport, emirates\
159
+ \ ID, trade licence, as applicable, and register the information via goAML for\
160
+ \ all cash transactions equal to or exceeding USD15,000 with individuals and all\
161
+ \ cash or wire transfer transactions equal to or exceeding USD15,000 with entities.\
162
+ \ The Regulator expects a dealer in any saleable item or a price equal to or greater\
163
+ \ than USD15,000 to also comply with this requirement;\n(b)\ta DNFBP that is a\
164
+ \ real estate agent to obtain relevant identification documents, such as passport,\
165
+ \ emirates ID, trade licence, as applicable, and register the information via\
166
+ \ goAML for all sales or purchases of Real Property where:\n(i)\tthe payment for\
167
+ \ the sale/purchase includes a total cash payment of USD15,000 or more whether\
168
+ \ in a single cash payment or multiple cash payments;\n(ii)\tthe payment for any\
169
+ \ part or all of the sale/purchase amount includes payment(s) using Virtual Assets;\n\
170
+ (iii)\tthe payment for any part or all of the sale/purchase amount includes funds\
171
+ \ that were converted from or to a Virtual Asset."
172
+ pipeline_tag: sentence-similarity
173
+ library_name: sentence-transformers
174
+ ---
175
+
176
+ # SentenceTransformer based on dunzhang/stella_en_400M_v5
177
+
178
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [dunzhang/stella_en_400M_v5](https://huggingface.co/dunzhang/stella_en_400M_v5) on the csv dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
179
+
180
+ ## Model Details
181
+
182
+ ### Model Description
183
+ - **Model Type:** Sentence Transformer
184
+ - **Base model:** [dunzhang/stella_en_400M_v5](https://huggingface.co/dunzhang/stella_en_400M_v5) <!-- at revision 24e2e1ffe95e95d807989938a5f3b8c18ee651f5 -->
185
+ - **Maximum Sequence Length:** 512 tokens
186
+ - **Output Dimensionality:** 1024 tokens
187
+ - **Similarity Function:** Cosine Similarity
188
+ - **Training Dataset:**
189
+ - csv
190
+ <!-- - **Language:** Unknown -->
191
+ <!-- - **License:** Unknown -->
192
+
193
+ ### Model Sources
194
+
195
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
196
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
197
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
198
+
199
+ ### Full Model Architecture
200
+
201
+ ```
202
+ SentenceTransformer(
203
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
204
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
205
+ (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
206
+ )
207
+ ```
208
+
209
+ ## Usage
210
+
211
+ ### Direct Usage (Sentence Transformers)
212
+
213
+ First install the Sentence Transformers library:
214
+
215
+ ```bash
216
+ pip install -U sentence-transformers
217
+ ```
218
+
219
+ Then you can load this model and run inference.
220
+ ```python
221
+ from sentence_transformers import SentenceTransformer
222
+
223
+ # Download from the 🤗 Hub
224
+ model = SentenceTransformer("jebish7/stella-MNSR-3")
225
+ # Run inference
226
+ sentences = [
227
+ "Could you provide examples of circumstances that, when changed, would necessitate the reevaluation of a customer's risk assessment and the application of updated CDD measures?",
228
+ 'DocumentID: 1 | PassageID: 8.1.2.(1) | Passage: A Relevant Person must also apply CDD measures to each existing customer under Rules \u200e8.3.1, \u200e8.4.1 or \u200e8.5.1 as applicable:\n(a)\twith a frequency appropriate to the outcome of the risk-based approach taken in relation to each customer; and\n(b)\twhen the Relevant Person becomes aware that any circumstances relevant to its risk assessment for a customer have changed.',
229
+ "DocumentID: 13 | PassageID: 9.2.1.Guidance.1. | Passage: The Regulator expects that an Authorised Person's Liquidity Risk strategy will set out the approach that the Authorised Person will take to Liquidity Risk management, including various quantitative and qualitative targets. It should be communicated to all relevant functions and staff within the organisation and be set out in the Authorised Person's Liquidity Risk policy.",
230
+ ]
231
+ embeddings = model.encode(sentences)
232
+ print(embeddings.shape)
233
+ # [3, 1024]
234
+
235
+ # Get the similarity scores for the embeddings
236
+ similarities = model.similarity(embeddings, embeddings)
237
+ print(similarities.shape)
238
+ # [3, 3]
239
+ ```
240
+
241
+ <!--
242
+ ### Direct Usage (Transformers)
243
+
244
+ <details><summary>Click to see the direct usage in Transformers</summary>
245
+
246
+ </details>
247
+ -->
248
+
249
+ <!--
250
+ ### Downstream Usage (Sentence Transformers)
251
+
252
+ You can finetune this model on your own dataset.
253
+
254
+ <details><summary>Click to expand</summary>
255
+
256
+ </details>
257
+ -->
258
+
259
+ <!--
260
+ ### Out-of-Scope Use
261
+
262
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
263
+ -->
264
+
265
+ <!--
266
+ ## Bias, Risks and Limitations
267
+
268
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
269
+ -->
270
+
271
+ <!--
272
+ ### Recommendations
273
+
274
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
275
+ -->
276
+
277
+ ## Training Details
278
+
279
+ ### Training Dataset
280
+
281
+ #### csv
282
+
283
+ * Dataset: csv
284
+ * Size: 29,545 training samples
285
+ * Columns: <code>anchor</code> and <code>positive</code>
286
+ * Approximate statistics based on the first 1000 samples:
287
+ | | anchor | positive |
288
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
289
+ | type | string | string |
290
+ | details | <ul><li>min: 16 tokens</li><li>mean: 35.04 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 27 tokens</li><li>mean: 129.43 tokens</li><li>max: 512 tokens</li></ul> |
291
+ * Samples:
292
+ | anchor | positive |
293
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
294
+ | <code>Could you outline the expected procedures for a Trade Repository to notify relevant authorities of any significant errors or omissions in previously submitted data?</code> | <code>DocumentID: 7 | PassageID: APP2.A2.1.2 | Passage: Processes and procedures. A Trade Repository must have effective processes and procedures to provide data to relevant authorities in a timely and appropriate manner to enable them to meet their respective regulatory mandates and legal responsibilities.</code> |
295
+ | <code>In the context of a non-binding MPO, how are commodities held by an Authorised Person treated for the purpose of determining the Commodities Risk Capital Requirement?</code> | <code>DocumentID: 9 | PassageID: 5.4.13.(a) | Passage: Commodities held by an Authorised Person for selling or leasing when executing a Murabaha, non-binding MPO, Salam or parallel Salam contract must be included in the calculation of its Commodities Risk Capital Requirement.</code> |
296
+ | <code>Can the FSRA provide case studies or examples of best practices for RIEs operating MTFs or OTFs using spot commodities in line with the Spot Commodities Framework?</code> | <code>DocumentID: 34 | PassageID: 77) | Passage: REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES<br>RIEs operating an MTF or OTF using Accepted Spot Commodities<br>This means that an RIE (in addition to operating markets relating to the trading of Financial Instruments) can, where permitted by the FSRA and subject to MIR Rule 3.4.2, operate a separate MTF or OTF under its Recognition Order. This MTF or OTF may operate using Accepted Spot Commodities.<br></code> |
297
+ * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
298
+ ```json
299
+ {
300
+ "scale": 20.0,
301
+ "similarity_fct": "cos_sim"
302
+ }
303
+ ```
304
+
305
+ ### Training Hyperparameters
306
+ #### Non-Default Hyperparameters
307
+
308
+ - `per_device_train_batch_size`: 16
309
+ - `num_train_epochs`: 1
310
+ - `warmup_ratio`: 0.1
311
+ - `batch_sampler`: no_duplicates
312
+
313
+ #### All Hyperparameters
314
+ <details><summary>Click to expand</summary>
315
+
316
+ - `overwrite_output_dir`: False
317
+ - `do_predict`: False
318
+ - `eval_strategy`: no
319
+ - `prediction_loss_only`: True
320
+ - `per_device_train_batch_size`: 16
321
+ - `per_device_eval_batch_size`: 8
322
+ - `per_gpu_train_batch_size`: None
323
+ - `per_gpu_eval_batch_size`: None
324
+ - `gradient_accumulation_steps`: 1
325
+ - `eval_accumulation_steps`: None
326
+ - `torch_empty_cache_steps`: None
327
+ - `learning_rate`: 5e-05
328
+ - `weight_decay`: 0.0
329
+ - `adam_beta1`: 0.9
330
+ - `adam_beta2`: 0.999
331
+ - `adam_epsilon`: 1e-08
332
+ - `max_grad_norm`: 1.0
333
+ - `num_train_epochs`: 1
334
+ - `max_steps`: -1
335
+ - `lr_scheduler_type`: linear
336
+ - `lr_scheduler_kwargs`: {}
337
+ - `warmup_ratio`: 0.1
338
+ - `warmup_steps`: 0
339
+ - `log_level`: passive
340
+ - `log_level_replica`: warning
341
+ - `log_on_each_node`: True
342
+ - `logging_nan_inf_filter`: True
343
+ - `save_safetensors`: True
344
+ - `save_on_each_node`: False
345
+ - `save_only_model`: False
346
+ - `restore_callback_states_from_checkpoint`: False
347
+ - `no_cuda`: False
348
+ - `use_cpu`: False
349
+ - `use_mps_device`: False
350
+ - `seed`: 42
351
+ - `data_seed`: None
352
+ - `jit_mode_eval`: False
353
+ - `use_ipex`: False
354
+ - `bf16`: False
355
+ - `fp16`: False
356
+ - `fp16_opt_level`: O1
357
+ - `half_precision_backend`: auto
358
+ - `bf16_full_eval`: False
359
+ - `fp16_full_eval`: False
360
+ - `tf32`: None
361
+ - `local_rank`: 0
362
+ - `ddp_backend`: None
363
+ - `tpu_num_cores`: None
364
+ - `tpu_metrics_debug`: False
365
+ - `debug`: []
366
+ - `dataloader_drop_last`: False
367
+ - `dataloader_num_workers`: 0
368
+ - `dataloader_prefetch_factor`: None
369
+ - `past_index`: -1
370
+ - `disable_tqdm`: False
371
+ - `remove_unused_columns`: True
372
+ - `label_names`: None
373
+ - `load_best_model_at_end`: False
374
+ - `ignore_data_skip`: False
375
+ - `fsdp`: []
376
+ - `fsdp_min_num_params`: 0
377
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
378
+ - `fsdp_transformer_layer_cls_to_wrap`: None
379
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
380
+ - `deepspeed`: None
381
+ - `label_smoothing_factor`: 0.0
382
+ - `optim`: adamw_torch
383
+ - `optim_args`: None
384
+ - `adafactor`: False
385
+ - `group_by_length`: False
386
+ - `length_column_name`: length
387
+ - `ddp_find_unused_parameters`: None
388
+ - `ddp_bucket_cap_mb`: None
389
+ - `ddp_broadcast_buffers`: False
390
+ - `dataloader_pin_memory`: True
391
+ - `dataloader_persistent_workers`: False
392
+ - `skip_memory_metrics`: True
393
+ - `use_legacy_prediction_loop`: False
394
+ - `push_to_hub`: False
395
+ - `resume_from_checkpoint`: None
396
+ - `hub_model_id`: None
397
+ - `hub_strategy`: every_save
398
+ - `hub_private_repo`: False
399
+ - `hub_always_push`: False
400
+ - `gradient_checkpointing`: False
401
+ - `gradient_checkpointing_kwargs`: None
402
+ - `include_inputs_for_metrics`: False
403
+ - `eval_do_concat_batches`: True
404
+ - `fp16_backend`: auto
405
+ - `push_to_hub_model_id`: None
406
+ - `push_to_hub_organization`: None
407
+ - `mp_parameters`:
408
+ - `auto_find_batch_size`: False
409
+ - `full_determinism`: False
410
+ - `torchdynamo`: None
411
+ - `ray_scope`: last
412
+ - `ddp_timeout`: 1800
413
+ - `torch_compile`: False
414
+ - `torch_compile_backend`: None
415
+ - `torch_compile_mode`: None
416
+ - `dispatch_batches`: None
417
+ - `split_batches`: None
418
+ - `include_tokens_per_second`: False
419
+ - `include_num_input_tokens_seen`: False
420
+ - `neftune_noise_alpha`: None
421
+ - `optim_target_modules`: None
422
+ - `batch_eval_metrics`: False
423
+ - `eval_on_start`: False
424
+ - `use_liger_kernel`: False
425
+ - `eval_use_gather_object`: False
426
+ - `batch_sampler`: no_duplicates
427
+ - `multi_dataset_batch_sampler`: proportional
428
+
429
+ </details>
430
+
431
+ ### Training Logs
432
+ | Epoch | Step | Training Loss |
433
+ |:------:|:----:|:-------------:|
434
+ | 0.0541 | 100 | 0.4442 |
435
+ | 0.1083 | 200 | 0.4793 |
436
+ | 0.1624 | 300 | 0.4395 |
437
+ | 0.2166 | 400 | 0.4783 |
438
+ | 0.2707 | 500 | 0.4573 |
439
+ | 0.3249 | 600 | 0.4235 |
440
+ | 0.3790 | 700 | 0.4029 |
441
+ | 0.4331 | 800 | 0.3951 |
442
+ | 0.4873 | 900 | 0.438 |
443
+ | 0.5414 | 1000 | 0.364 |
444
+ | 0.5956 | 1100 | 0.3732 |
445
+ | 0.6497 | 1200 | 0.3932 |
446
+ | 0.7038 | 1300 | 0.3387 |
447
+ | 0.7580 | 1400 | 0.2956 |
448
+ | 0.8121 | 1500 | 0.3612 |
449
+ | 0.8663 | 1600 | 0.3333 |
450
+ | 0.9204 | 1700 | 0.2837 |
451
+ | 0.9746 | 1800 | 0.2785 |
452
+ | 0.0541 | 100 | 0.2263 |
453
+ | 0.1083 | 200 | 0.2085 |
454
+ | 0.1624 | 300 | 0.1638 |
455
+ | 0.2166 | 400 | 0.2085 |
456
+ | 0.2707 | 500 | 0.2442 |
457
+ | 0.3249 | 600 | 0.1965 |
458
+ | 0.3790 | 700 | 0.2548 |
459
+ | 0.4331 | 800 | 0.2504 |
460
+ | 0.4873 | 900 | 0.2358 |
461
+ | 0.5414 | 1000 | 0.2083 |
462
+ | 0.5956 | 1100 | 0.2117 |
463
+ | 0.6497 | 1200 | 0.248 |
464
+ | 0.7038 | 1300 | 0.221 |
465
+ | 0.7580 | 1400 | 0.1886 |
466
+ | 0.8121 | 1500 | 0.2653 |
467
+ | 0.8663 | 1600 | 0.2651 |
468
+ | 0.9204 | 1700 | 0.2349 |
469
+ | 0.9746 | 1800 | 0.2435 |
470
+ | 0.0541 | 100 | 0.143 |
471
+ | 0.1083 | 200 | 0.0701 |
472
+ | 0.1624 | 300 | 0.0675 |
473
+ | 0.2166 | 400 | 0.0977 |
474
+ | 0.2707 | 500 | 0.1157 |
475
+ | 0.3249 | 600 | 0.0823 |
476
+ | 0.3790 | 700 | 0.1022 |
477
+ | 0.4331 | 800 | 0.114 |
478
+ | 0.4873 | 900 | 0.0955 |
479
+ | 0.5414 | 1000 | 0.0905 |
480
+ | 0.5956 | 1100 | 0.0959 |
481
+ | 0.6497 | 1200 | 0.1308 |
482
+ | 0.7038 | 1300 | 0.1285 |
483
+ | 0.7580 | 1400 | 0.1006 |
484
+ | 0.8121 | 1500 | 0.1553 |
485
+ | 0.8663 | 1600 | 0.1769 |
486
+ | 0.9204 | 1700 | 0.1965 |
487
+ | 0.9746 | 1800 | 0.2271 |
488
+
489
+
490
+ ### Framework Versions
491
+ - Python: 3.10.12
492
+ - Sentence Transformers: 3.1.1
493
+ - Transformers: 4.45.2
494
+ - PyTorch: 2.5.1+cu121
495
+ - Accelerate: 1.1.1
496
+ - Datasets: 3.1.0
497
+ - Tokenizers: 0.20.3
498
+
499
+ ## Citation
500
+
501
+ ### BibTeX
502
+
503
+ #### Sentence Transformers
504
+ ```bibtex
505
+ @inproceedings{reimers-2019-sentence-bert,
506
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
507
+ author = "Reimers, Nils and Gurevych, Iryna",
508
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
509
+ month = "11",
510
+ year = "2019",
511
+ publisher = "Association for Computational Linguistics",
512
+ url = "https://arxiv.org/abs/1908.10084",
513
+ }
514
+ ```
515
+
516
+ <!--
517
+ ## Glossary
518
+
519
+ *Clearly define terms in order to be accessible across audiences.*
520
+ -->
521
+
522
+ <!--
523
+ ## Model Card Authors
524
+
525
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
526
+ -->
527
+
528
+ <!--
529
+ ## Model Card Contact
530
+
531
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
532
+ -->
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "dunzhang/stella_en_400M_v5",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "dunzhang/stella_en_400M_v5--configuration.NewConfig",
9
+ "AutoModel": "dunzhang/stella_en_400M_v5--modeling.NewModel"
10
+ },
11
+ "classifier_dropout": null,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 1024,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 4096,
17
+ "layer_norm_eps": 1e-12,
18
+ "layer_norm_type": "layer_norm",
19
+ "logn_attention_clip1": false,
20
+ "logn_attention_scale": false,
21
+ "max_position_embeddings": 8192,
22
+ "model_type": "new",
23
+ "num_attention_heads": 16,
24
+ "num_hidden_layers": 24,
25
+ "pack_qkv": true,
26
+ "pad_token_id": 0,
27
+ "position_embedding_type": "rope",
28
+ "rope_scaling": {
29
+ "factor": 2.0,
30
+ "type": "ntk"
31
+ },
32
+ "rope_theta": 160000,
33
+ "torch_dtype": "float32",
34
+ "transformers_version": "4.45.2",
35
+ "type_vocab_size": 2,
36
+ "unpad_inputs": true,
37
+ "use_memory_efficient_attention": true,
38
+ "vocab_size": 30528
39
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {
8
+ "s2p_query": "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: ",
9
+ "s2s_query": "Instruct: Retrieve semantically similar text.\nQuery: "
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4e8e353975c42aa76e1fb3c110a14976f40bdce3a307676aa16c2bb256c8452
3
+ size 1736585680
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "max_length": 8000,
49
+ "model_max_length": 512,
50
+ "pad_to_multiple_of": null,
51
+ "pad_token": "[PAD]",
52
+ "pad_token_type_id": 0,
53
+ "padding_side": "right",
54
+ "sep_token": "[SEP]",
55
+ "stride": 0,
56
+ "strip_accents": null,
57
+ "tokenize_chinese_chars": true,
58
+ "tokenizer_class": "BertTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "[UNK]"
62
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff