BelisaDi commited on
Commit
158ad83
1 Parent(s): 3ceb804

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 1024, "out_features": 1024, "bias": true, "activation_function": "torch.nn.modules.linear.Identity"}
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66c492cd2dbe395b82de0996f9cc782bf3ccdd390916df63edcccb5340332a56
3
+ size 4198560
README.md ADDED
@@ -0,0 +1,519 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:29547
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: dunzhang/stella_en_400M_v5
10
+ widget:
11
+ - source_sentence: When calculating regulatory capital, which guidance note outlines
12
+ the potential for an increased valuation adjustment for less liquid positions
13
+ that may surpass the adjustments made for financial reporting purposes?
14
+ sentences:
15
+ - 'REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES
16
+
17
+ Spot Commodities and Accepted Spot Commodities
18
+
19
+ Authorised Persons will need to submit the details of how each Accepted Spot Commodity
20
+ that is proposed to be used meets the requirements for the purposes of COBS Rule
21
+ 22.2.2 and paragraphs 25 and 26 above. The use of each Accepted Spot Commodity
22
+ will be approved as part of the formal application process for review and approval
23
+ of an FSP. Though an Authorised Person may, for example, propose to admit to
24
+ trading a commonly traded Spot Commodity, the Authorised Person’s controls relating
25
+ to responsible and sustainable sourcing, and sound delivery mechanisms may not
26
+ yet be fully developed. In such circumstances, the FSRA may require the Authorised
27
+ Person to delay the commencement of trading until such time that suitable controls
28
+ have been developed and implemented.
29
+
30
+ '
31
+ - 'Adjustment to the current valuation of less liquid positions for regulatory capital
32
+ purposes. The adjustment to the current valuation of less liquid positions made
33
+ under Guidance note 11 is likely to impact minimum Capital Requirements and may
34
+ exceed those valuation adjustments made under the International Financial Reporting
35
+ Standards and Guidance notes 8 and 9.
36
+
37
+
38
+ '
39
+ - "REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES\
40
+ \ IN RELATION TO VIRTUAL ASSETS\nAnti-Money Laundering and Countering Financing\
41
+ \ of Terrorism\nIn order to develop a robust and sustainable regulatory framework\
42
+ \ for Virtual Assets, FSRA is of the view that a comprehensive application of\
43
+ \ its AML/CFT framework should be in place, including full compliance with, among\
44
+ \ other things, the:\n\na)\tUAE AML/CFT Federal Laws, including the UAE Cabinet\
45
+ \ Resolution No. (10) of 2019 Concerning the Executive Regulation of the Federal\
46
+ \ Law No. 20 of 2018 concerning Anti-Money Laundering and Combating Terrorism\
47
+ \ Financing;\n\nb)\tUAE Cabinet Resolution 20 of 2019 concerning the procedures\
48
+ \ of dealing with those listed under the UN sanctions list and UAE/local terrorist\
49
+ \ lists issued by the Cabinet, including the FSRA AML and Sanctions Rules and\
50
+ \ Guidance (“AML Rules”) or such other AML rules as may be applicable in ADGM\
51
+ \ from time to time; and\n\nc)\tadoption of international best practices (including\
52
+ \ the FATF Recommendations).\n"
53
+ - source_sentence: Are there any ADGM-specific guidelines or best practices for integrating
54
+ anti-money laundering (AML) compliance into our technology and financial systems
55
+ to manage operational risks effectively?
56
+ sentences:
57
+ - 'REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES
58
+ IN RELATION TO VIRTUAL ASSETS
59
+
60
+ Security measures and procedures
61
+
62
+ IT infrastructures should be strong enough to resist, without significant loss
63
+ to Clients, a number of scenarios, including but not limited to: accidental destruction
64
+ or breach of data, collusion or leakage of information by employees/former employees,
65
+ successful hack of a cryptographic and hardware security module or server, or
66
+ access by hackers of any single set of encryption/decryption keys that could result
67
+ in a complete system breach.
68
+
69
+ '
70
+ - A Relevant Person may use a database maintained elsewhere for an up-to-date list
71
+ of resolutions and Sanctions, or to perform checks of customers or transactions
72
+ against that list. For example, it may wish to use a database maintained by its
73
+ head office or a Group member. However, the Relevant Person retains responsibility
74
+ for ensuring that its systems and controls are effective to ensure compliance
75
+ with this Rulebook.
76
+ - 'DIGITAL SECURITIES SETTLEMENT
77
+
78
+ Digital Settlement Facilities (DSFs)
79
+
80
+ For the purposes of this Guidance and distinct from RCHs, the FSRA will consider
81
+ DSFs suitable for the purposes of settlement (MIR Rule 3.8) and custody (MIR Rule
82
+ 2.10) of Digital Securities. A DSF, holding an FSP for Providing Custody, may
83
+ provide custody and settlement services in Digital Securities for RIEs and MTFs
84
+ (as applicable). Therefore, for the purposes of custody and settlement of Digital
85
+ Securities, the arrangements that a RIE or MTF would normally have in place with
86
+ a RCH can be replaced with arrangements provided by a DSF, provided that certain
87
+ requirements, as described in this section, are met.
88
+
89
+ '
90
+ - source_sentence: In the context of the Risk-Based Approach (RBA), how should a Relevant
91
+ Person prioritize and address the risks once they have been identified and assessed?
92
+ sentences:
93
+ - If the Regulator considers that an auditor or actuary has committed a contravention
94
+ of these Regulations, it may disqualify the auditor or actuary from being the
95
+ auditor of, or (as the case may be), from acting as an actuary for, any Authorised
96
+ Person, Recognised Body or Reporting Entity or any particular class thereof.
97
+ - The Regulator shall have the power to require an Institution in Resolution, or
98
+ any of its Group Entities, to provide any services or facilities (excluding any
99
+ financial support) that are necessary to enable the Recipient to operate the transferred
100
+ business effectively, including where the Institution under Resolution or relevant
101
+ Group Entity has entered into Insolvency Proceedings.
102
+ - In addition to assessing risk arising from money laundering, a business risk assessment
103
+ should assess the potential exposure of a Relevant Person to other Financial Crime,
104
+ such as fraud and the theft of personal data. The business risk assessment should
105
+ also address the Relevant Person’s potential exposure to cyber security risk,
106
+ as this risk may have a material impact on the Relevant Person’s ability to prevent
107
+ Financial Crime.
108
+ - source_sentence: Can you provide further clarification on the specific measures
109
+ deemed adequate for handling conflicts of interest related to the provision and
110
+ management of credit within an Authorised Person's organization?
111
+ sentences:
112
+ - An Authorised Person with one or more branches outside the ADGM must implement
113
+ and maintain Credit Risk policies adapted to each local market and its regulatory
114
+ conditions.
115
+ - "In addition, applications for recognition as a Remote Investment Exchange or\
116
+ \ Remote Clearing House must contain:\n(a)\tthe address of the Applicant's head\
117
+ \ office in its home jurisdiction;\n(b)\tthe address of a place in the Abu Dhabi\
118
+ \ Global Market for the service on the Applicant of notices or other documents\
119
+ \ required or authorised to be served on it;\n(c)\tinformation identifying any\
120
+ \ type of activity which the Applicant envisages undertaking in the Abu Dhabi\
121
+ \ Global Market and the extent and nature of usage and membership;\n(d)\ta comparative\
122
+ \ analysis of the Applicant's regulatory requirements in its home jurisdiction\
123
+ \ compared against those under the Rules set out in this Rulebook and those contained\
124
+ \ in the “Principles for Financial Market Infrastructures” issued by IOSCO and\
125
+ \ the Committee on Payment and Settlement Systems (April 2012);\n(e)\tthe information,\
126
+ \ evidence and explanatory material necessary to demonstrate to the Regulator\
127
+ \ that the requirements specified in Rule ‎7.2.2 are met;\n(f)\tone copy of each\
128
+ \ of the following documents:\n(i)\tits most recent financial statements; and\n\
129
+ (ii)\tthe Applicant’s memorandum and articles of association or any similar documents;\
130
+ \ and\n(g)\tthe date by which the Applicant wishes the Recognition Order to take\
131
+ \ effect."
132
+ - Financial risk . All applicants are required to demonstrate they have a sound
133
+ initial capital base and funding and must be able to meet the relevant prudential
134
+ requirements of ADGM legislation, on an ongoing basis. This includes holding enough
135
+ capital resources to cover expenses even if expected revenue takes time to materialise.
136
+ Start-ups can encounter greater financial risks as they seek to establish and
137
+ grow a new business.
138
+ - source_sentence: What are the recommended best practices for ensuring that all disclosures
139
+ are prepared in accordance with the PRMS, and how can we validate that our classification
140
+ and reporting of Petroleum Resources meet the standards set forth?
141
+ sentences:
142
+ - Notwithstanding this Rule, an Authorised Person would generally be expected to
143
+ separate the roles of Compliance Officer and Senior Executive Officer. In addition,
144
+ the roles of Compliance Officer, Finance Officer and Money Laundering Reporting
145
+ Officer would not be expected to be combined with any other Controlled Functions
146
+ unless appropriate monitoring and control arrangements independent of the individual
147
+ concerned will be implemented by the Authorised Person. This may be possible in
148
+ the case of a Branch, where monitoring and controlling of the individual (carrying
149
+ out more than one role in the Branch) is conducted from the Authorised Person's
150
+ home state by an appropriate individual for each of the relevant Controlled Functions
151
+ as applicable. However, it is recognised that, on a case by case basis, there
152
+ may be exceptional circumstances in which this may not always be practical or
153
+ possible.
154
+ - 'DISCLOSURE REQUIREMENTS .
155
+
156
+ Material Exploration and drilling results
157
+
158
+ Rule 12.5.1 sets out the reporting requirements relevant to disclosures of material
159
+ Exploration and drilling results in relation to Petroleum Resources. Such disclosures
160
+ should be presented in a factual and balanced manner, and contain sufficient information
161
+ to allow investors and their advisers to make an informed judgement of its materiality. Care
162
+ needs to be taken to ensure that a disclosure does not suggest, without reasonable
163
+ grounds, that commercially recoverable or potentially recoverable quantities of
164
+ Petroleum have been discovered, in the absence of determining and disclosing estimates
165
+ of Petroleum Resources in accordance with Chapter 12 and the PRMS.
166
+
167
+ '
168
+ - 'REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES
169
+ IN RELATION TO VIRTUAL ASSETS
170
+
171
+ Origin and destination of Virtual Asset funds
172
+
173
+ Currently, there are technology solutions developed in-house and available from
174
+ third party service providers which enable the tracking of Virtual Assets through
175
+ multiple transactions to more accurately identify the source and destination of
176
+ these Virtual Assets. It is expected that Authorised Persons may need to consider
177
+ the use of such solutions and other systems to adequately meet their anti-money
178
+ laundering, financial crime and know-your-customer obligations under the Virtual
179
+ Asset Framework.
180
+
181
+ '
182
+ pipeline_tag: sentence-similarity
183
+ library_name: sentence-transformers
184
+ ---
185
+
186
+ # SentenceTransformer based on dunzhang/stella_en_400M_v5
187
+
188
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [dunzhang/stella_en_400M_v5](https://huggingface.co/dunzhang/stella_en_400M_v5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
189
+
190
+ ## Model Details
191
+
192
+ ### Model Description
193
+ - **Model Type:** Sentence Transformer
194
+ - **Base model:** [dunzhang/stella_en_400M_v5](https://huggingface.co/dunzhang/stella_en_400M_v5) <!-- at revision 2aa5579fcae1c579de199a3866b6e514bbbf5d10 -->
195
+ - **Maximum Sequence Length:** 512 tokens
196
+ - **Output Dimensionality:** 1024 tokens
197
+ - **Similarity Function:** Cosine Similarity
198
+ <!-- - **Training Dataset:** Unknown -->
199
+ <!-- - **Language:** Unknown -->
200
+ <!-- - **License:** Unknown -->
201
+
202
+ ### Model Sources
203
+
204
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
205
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
206
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
207
+
208
+ ### Full Model Architecture
209
+
210
+ ```
211
+ SentenceTransformer(
212
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
213
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
214
+ (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
215
+ )
216
+ ```
217
+
218
+ ## Usage
219
+
220
+ ### Direct Usage (Sentence Transformers)
221
+
222
+ First install the Sentence Transformers library:
223
+
224
+ ```bash
225
+ pip install -U sentence-transformers
226
+ ```
227
+
228
+ Then you can load this model and run inference.
229
+ ```python
230
+ from sentence_transformers import SentenceTransformer
231
+
232
+ # Download from the 🤗 Hub
233
+ model = SentenceTransformer("BelisaDi/stella-tuned-rirag")
234
+ # Run inference
235
+ sentences = [
236
+ 'What are the recommended best practices for ensuring that all disclosures are prepared in accordance with the PRMS, and how can we validate that our classification and reporting of Petroleum Resources meet the standards set forth?',
237
+ 'DISCLOSURE REQUIREMENTS .\nMaterial Exploration and drilling results\nRule 12.5.1 sets out the reporting requirements relevant to disclosures of material Exploration and drilling results in relation to Petroleum Resources. Such disclosures should be presented in a factual and balanced manner, and contain sufficient information to allow investors and their advisers to make an informed judgement of its materiality. Care needs to be taken to ensure that a disclosure does not suggest, without reasonable grounds, that commercially recoverable or potentially recoverable quantities of Petroleum have been discovered, in the absence of determining and disclosing estimates of Petroleum Resources in accordance with Chapter 12 and the PRMS.\n',
238
+ "Notwithstanding this Rule, an Authorised Person would generally be expected to separate the roles of Compliance Officer and Senior Executive Officer. In addition, the roles of Compliance Officer, Finance Officer and Money Laundering Reporting Officer would not be expected to be combined with any other Controlled Functions unless appropriate monitoring and control arrangements independent of the individual concerned will be implemented by the Authorised Person. This may be possible in the case of a Branch, where monitoring and controlling of the individual (carrying out more than one role in the Branch) is conducted from the Authorised Person's home state by an appropriate individual for each of the relevant Controlled Functions as applicable. However, it is recognised that, on a case by case basis, there may be exceptional circumstances in which this may not always be practical or possible.",
239
+ ]
240
+ embeddings = model.encode(sentences)
241
+ print(embeddings.shape)
242
+ # [3, 1024]
243
+
244
+ # Get the similarity scores for the embeddings
245
+ similarities = model.similarity(embeddings, embeddings)
246
+ print(similarities.shape)
247
+ # [3, 3]
248
+ ```
249
+
250
+ <!--
251
+ ### Direct Usage (Transformers)
252
+
253
+ <details><summary>Click to see the direct usage in Transformers</summary>
254
+
255
+ </details>
256
+ -->
257
+
258
+ <!--
259
+ ### Downstream Usage (Sentence Transformers)
260
+
261
+ You can finetune this model on your own dataset.
262
+
263
+ <details><summary>Click to expand</summary>
264
+
265
+ </details>
266
+ -->
267
+
268
+ <!--
269
+ ### Out-of-Scope Use
270
+
271
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
272
+ -->
273
+
274
+ <!--
275
+ ## Bias, Risks and Limitations
276
+
277
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
278
+ -->
279
+
280
+ <!--
281
+ ### Recommendations
282
+
283
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
284
+ -->
285
+
286
+ ## Training Details
287
+
288
+ ### Training Dataset
289
+
290
+ #### Unnamed Dataset
291
+
292
+
293
+ * Size: 29,547 training samples
294
+ * Columns: <code>anchor</code> and <code>positive</code>
295
+ * Approximate statistics based on the first 1000 samples:
296
+ | | anchor | positive |
297
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
298
+ | type | string | string |
299
+ | details | <ul><li>min: 15 tokens</li><li>mean: 34.89 tokens</li><li>max: 96 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 115.67 tokens</li><li>max: 512 tokens</li></ul> |
300
+ * Samples:
301
+ | anchor | positive |
302
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
303
+ | <code>Under Rules 7.3.2 and 7.3.3, what are the two specific conditions related to the maturity of a financial instrument that would trigger a disclosure requirement?</code> | <code>Events that trigger a disclosure. For the purposes of Rules 7.3.2 and 7.3.3, a Person is taken to hold Financial Instruments in or relating to a Reporting Entity, if the Person holds a Financial Instrument that on its maturity will confer on him:<br>(1) an unconditional right to acquire the Financial Instrument; or<br>(2) the discretion as to his right to acquire the Financial Instrument.<br></code> |
304
+ | <code>**Best Execution and Transaction Handling**: What constitutes 'Best Execution' under Rule 6.5 in the context of virtual assets, and how should Authorised Persons document and demonstrate this?</code> | <code>The following COBS Rules should be read as applying to all Transactions undertaken by an Authorised Person conducting a Regulated Activity in relation to Virtual Assets, irrespective of any restrictions on application or any exception to these Rules elsewhere in COBS -<br>(a) Rule 3.4 (Suitability);<br>(b) Rule 6.5 (Best Execution);<br>(c) Rule 6.7 (Aggregation and Allocation);<br>(d) Rule 6.10 (Confirmation Notes);<br>(e) Rule 6.11 (Periodic Statements); and<br>(f) Chapter 12 (Key Information and Client Agreement).</code> |
305
+ | <code>How does the FSRA define and evaluate "principal risks and uncertainties" for a Petroleum Reporting Entity, particularly for the remaining six months of the financial year?</code> | <code>A Reporting Entity must:<br>(a) prepare such report:<br>(i) for the first six months of each financial year or period, and if there is a change to the accounting reference date, prepare such report in respect of the period up to the old accounting reference date; and<br>(ii) in accordance with the applicable IFRS standards or other standards acceptable to the Regulator;<br>(b) ensure the financial statements have either been audited or reviewed by auditors, and the audit or review by the auditor is included within the report; and<br>(c) ensure that the report includes:<br>(i) except in the case of a Mining Exploration Reporting Entity or a Petroleum Exploration Reporting Entity, an indication of important events that have occurred during the first six months of the financial year, and their impact on the financial statements;<br>(ii) except in the case of a Mining Exploration Reporting Entity or a Petroleum Exploration Reporting Entity, a description of the principal risks and uncertainties for the remaining six months of the financial year; and<br>(iii) a condensed set of financial statements, an interim management report and associated responsibility statements.</code> |
306
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
307
+ ```json
308
+ {
309
+ "scale": 20.0,
310
+ "similarity_fct": "cos_sim"
311
+ }
312
+ ```
313
+
314
+ ### Training Hyperparameters
315
+ #### Non-Default Hyperparameters
316
+
317
+ - `learning_rate`: 2e-05
318
+ - `auto_find_batch_size`: True
319
+
320
+ #### All Hyperparameters
321
+ <details><summary>Click to expand</summary>
322
+
323
+ - `overwrite_output_dir`: False
324
+ - `do_predict`: False
325
+ - `eval_strategy`: no
326
+ - `prediction_loss_only`: True
327
+ - `per_device_train_batch_size`: 8
328
+ - `per_device_eval_batch_size`: 8
329
+ - `per_gpu_train_batch_size`: None
330
+ - `per_gpu_eval_batch_size`: None
331
+ - `gradient_accumulation_steps`: 1
332
+ - `eval_accumulation_steps`: None
333
+ - `torch_empty_cache_steps`: None
334
+ - `learning_rate`: 2e-05
335
+ - `weight_decay`: 0.0
336
+ - `adam_beta1`: 0.9
337
+ - `adam_beta2`: 0.999
338
+ - `adam_epsilon`: 1e-08
339
+ - `max_grad_norm`: 1.0
340
+ - `num_train_epochs`: 3
341
+ - `max_steps`: -1
342
+ - `lr_scheduler_type`: linear
343
+ - `lr_scheduler_kwargs`: {}
344
+ - `warmup_ratio`: 0.0
345
+ - `warmup_steps`: 0
346
+ - `log_level`: passive
347
+ - `log_level_replica`: warning
348
+ - `log_on_each_node`: True
349
+ - `logging_nan_inf_filter`: True
350
+ - `save_safetensors`: True
351
+ - `save_on_each_node`: False
352
+ - `save_only_model`: False
353
+ - `restore_callback_states_from_checkpoint`: False
354
+ - `no_cuda`: False
355
+ - `use_cpu`: False
356
+ - `use_mps_device`: False
357
+ - `seed`: 42
358
+ - `data_seed`: None
359
+ - `jit_mode_eval`: False
360
+ - `use_ipex`: False
361
+ - `bf16`: False
362
+ - `fp16`: False
363
+ - `fp16_opt_level`: O1
364
+ - `half_precision_backend`: auto
365
+ - `bf16_full_eval`: False
366
+ - `fp16_full_eval`: False
367
+ - `tf32`: None
368
+ - `local_rank`: 0
369
+ - `ddp_backend`: None
370
+ - `tpu_num_cores`: None
371
+ - `tpu_metrics_debug`: False
372
+ - `debug`: []
373
+ - `dataloader_drop_last`: False
374
+ - `dataloader_num_workers`: 0
375
+ - `dataloader_prefetch_factor`: None
376
+ - `past_index`: -1
377
+ - `disable_tqdm`: False
378
+ - `remove_unused_columns`: True
379
+ - `label_names`: None
380
+ - `load_best_model_at_end`: False
381
+ - `ignore_data_skip`: False
382
+ - `fsdp`: []
383
+ - `fsdp_min_num_params`: 0
384
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
385
+ - `fsdp_transformer_layer_cls_to_wrap`: None
386
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
387
+ - `deepspeed`: None
388
+ - `label_smoothing_factor`: 0.0
389
+ - `optim`: adamw_torch
390
+ - `optim_args`: None
391
+ - `adafactor`: False
392
+ - `group_by_length`: False
393
+ - `length_column_name`: length
394
+ - `ddp_find_unused_parameters`: None
395
+ - `ddp_bucket_cap_mb`: None
396
+ - `ddp_broadcast_buffers`: False
397
+ - `dataloader_pin_memory`: True
398
+ - `dataloader_persistent_workers`: False
399
+ - `skip_memory_metrics`: True
400
+ - `use_legacy_prediction_loop`: False
401
+ - `push_to_hub`: False
402
+ - `resume_from_checkpoint`: None
403
+ - `hub_model_id`: None
404
+ - `hub_strategy`: every_save
405
+ - `hub_private_repo`: False
406
+ - `hub_always_push`: False
407
+ - `gradient_checkpointing`: False
408
+ - `gradient_checkpointing_kwargs`: None
409
+ - `include_inputs_for_metrics`: False
410
+ - `eval_do_concat_batches`: True
411
+ - `fp16_backend`: auto
412
+ - `push_to_hub_model_id`: None
413
+ - `push_to_hub_organization`: None
414
+ - `mp_parameters`:
415
+ - `auto_find_batch_size`: True
416
+ - `full_determinism`: False
417
+ - `torchdynamo`: None
418
+ - `ray_scope`: last
419
+ - `ddp_timeout`: 1800
420
+ - `torch_compile`: False
421
+ - `torch_compile_backend`: None
422
+ - `torch_compile_mode`: None
423
+ - `dispatch_batches`: None
424
+ - `split_batches`: None
425
+ - `include_tokens_per_second`: False
426
+ - `include_num_input_tokens_seen`: False
427
+ - `neftune_noise_alpha`: None
428
+ - `optim_target_modules`: None
429
+ - `batch_eval_metrics`: False
430
+ - `eval_on_start`: False
431
+ - `use_liger_kernel`: False
432
+ - `eval_use_gather_object`: False
433
+ - `batch_sampler`: batch_sampler
434
+ - `multi_dataset_batch_sampler`: proportional
435
+
436
+ </details>
437
+
438
+ ### Training Logs
439
+ | Epoch | Step | Training Loss |
440
+ |:------:|:-----:|:-------------:|
441
+ | 0.1354 | 500 | 0.3078 |
442
+ | 0.2707 | 1000 | 0.3142 |
443
+ | 0.4061 | 1500 | 0.2546 |
444
+ | 0.5414 | 2000 | 0.2574 |
445
+ | 0.6768 | 2500 | 0.247 |
446
+ | 0.8121 | 3000 | 0.2532 |
447
+ | 0.9475 | 3500 | 0.2321 |
448
+ | 1.0828 | 4000 | 0.1794 |
449
+ | 1.2182 | 4500 | 0.1588 |
450
+ | 1.3535 | 5000 | 0.154 |
451
+ | 1.4889 | 5500 | 0.1592 |
452
+ | 1.6243 | 6000 | 0.1632 |
453
+ | 1.7596 | 6500 | 0.1471 |
454
+ | 1.8950 | 7000 | 0.1669 |
455
+ | 2.0303 | 7500 | 0.1368 |
456
+ | 2.1657 | 8000 | 0.0982 |
457
+ | 2.3010 | 8500 | 0.1125 |
458
+ | 2.4364 | 9000 | 0.089 |
459
+ | 2.5717 | 9500 | 0.0902 |
460
+ | 2.7071 | 10000 | 0.0867 |
461
+ | 2.8424 | 10500 | 0.1017 |
462
+ | 2.9778 | 11000 | 0.0835 |
463
+
464
+
465
+ ### Framework Versions
466
+ - Python: 3.10.12
467
+ - Sentence Transformers: 3.1.1
468
+ - Transformers: 4.45.2
469
+ - PyTorch: 2.5.0+cu124
470
+ - Accelerate: 1.0.1
471
+ - Datasets: 3.0.2
472
+ - Tokenizers: 0.20.1
473
+
474
+ ## Citation
475
+
476
+ ### BibTeX
477
+
478
+ #### Sentence Transformers
479
+ ```bibtex
480
+ @inproceedings{reimers-2019-sentence-bert,
481
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
482
+ author = "Reimers, Nils and Gurevych, Iryna",
483
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
484
+ month = "11",
485
+ year = "2019",
486
+ publisher = "Association for Computational Linguistics",
487
+ url = "https://arxiv.org/abs/1908.10084",
488
+ }
489
+ ```
490
+
491
+ #### MultipleNegativesRankingLoss
492
+ ```bibtex
493
+ @misc{henderson2017efficient,
494
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
495
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
496
+ year={2017},
497
+ eprint={1705.00652},
498
+ archivePrefix={arXiv},
499
+ primaryClass={cs.CL}
500
+ }
501
+ ```
502
+
503
+ <!--
504
+ ## Glossary
505
+
506
+ *Clearly define terms in order to be accessible across audiences.*
507
+ -->
508
+
509
+ <!--
510
+ ## Model Card Authors
511
+
512
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
513
+ -->
514
+
515
+ <!--
516
+ ## Model Card Contact
517
+
518
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
519
+ -->
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "finetuned_models/stella-tuned",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration.NewConfig",
9
+ "AutoModel": "dunzhang/stella_en_400M_v5--modeling.NewModel"
10
+ },
11
+ "classifier_dropout": null,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 1024,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 4096,
17
+ "layer_norm_eps": 1e-12,
18
+ "layer_norm_type": "layer_norm",
19
+ "logn_attention_clip1": false,
20
+ "logn_attention_scale": false,
21
+ "max_position_embeddings": 8192,
22
+ "model_type": "new",
23
+ "num_attention_heads": 16,
24
+ "num_hidden_layers": 24,
25
+ "pack_qkv": true,
26
+ "pad_token_id": 0,
27
+ "position_embedding_type": "rope",
28
+ "rope_scaling": {
29
+ "factor": 2.0,
30
+ "type": "ntk"
31
+ },
32
+ "rope_theta": 160000,
33
+ "torch_dtype": "float32",
34
+ "transformers_version": "4.45.2",
35
+ "type_vocab_size": 2,
36
+ "unpad_inputs": true,
37
+ "use_memory_efficient_attention": true,
38
+ "vocab_size": 30528
39
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.5.0+cu124"
6
+ },
7
+ "prompts": {
8
+ "s2p_query": "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: ",
9
+ "s2s_query": "Instruct: Retrieve semantically similar text.\nQuery: "
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine"
13
+ }
configuration.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ NEW model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class NewConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "new"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8fab7a1d00723bbbfda6b8c13335d3a4d82d2c337f18cd479f05dd90e2054c7
3
+ size 1736585680
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "max_length": 8000,
49
+ "model_max_length": 512,
50
+ "pad_to_multiple_of": null,
51
+ "pad_token": "[PAD]",
52
+ "pad_token_type_id": 0,
53
+ "padding_side": "right",
54
+ "sep_token": "[SEP]",
55
+ "stride": 0,
56
+ "strip_accents": null,
57
+ "tokenize_chinese_chars": true,
58
+ "tokenizer_class": "BertTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "[UNK]"
62
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff