Add new SentenceTransformer model.

Browse files

Files changed (14) hide show

1_Pooling/config.json +10 -0
2_Dense/config.json +1 -0
2_Dense/model.safetensors +3 -0
README.md +519 -0
config.json +39 -0
config_sentence_transformers.json +13 -0
configuration.py +145 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +62 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 1024,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

2_Dense/config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"in_features": 1024, "out_features": 1024, "bias": true, "activation_function": "torch.nn.modules.linear.Identity"}

2_Dense/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66c492cd2dbe395b82de0996f9cc782bf3ccdd390916df63edcccb5340332a56
+size 4198560

README.md ADDED Viewed

	@@ -0,0 +1,519 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:29547
+- loss:MultipleNegativesRankingLoss
+base_model: dunzhang/stella_en_400M_v5
+widget:
+- source_sentence: When calculating regulatory capital, which guidance note outlines
+    the potential for an increased valuation adjustment for less liquid positions
+    that may surpass the adjustments made for financial reporting purposes?
+  sentences:
+  - 'REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES
+    Spot Commodities and Accepted Spot Commodities
+    Authorised Persons will need to submit the details of how each Accepted Spot Commodity
+    that is proposed to be used meets the requirements for the purposes of COBS Rule
+    22.2.2 and paragraphs 25 and 26 above.  The use of each Accepted Spot Commodity
+    will be approved as part of the formal application process for review and approval
+    of an FSP.  Though an Authorised Person may, for example, propose to admit to
+    trading a commonly traded Spot Commodity, the Authorised Person’s controls relating
+    to responsible and sustainable sourcing, and sound delivery mechanisms may not
+    yet be fully developed.  In such circumstances, the FSRA may require the Authorised
+    Person to delay the commencement of trading until such time that suitable controls
+    have been developed and implemented.
+    '
+  - 'Adjustment to the current valuation of less liquid positions for regulatory capital
+    purposes. The adjustment to the current valuation of less liquid positions made
+    under Guidance note 11 is likely to impact minimum Capital Requirements and may
+    exceed those valuation adjustments made under the International Financial Reporting
+    Standards and Guidance notes 8 and 9.
+    '
+  - "REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES\
+    \ IN RELATION TO VIRTUAL ASSETS\nAnti-Money Laundering and Countering Financing\
+    \ of Terrorism\nIn order to develop a robust and sustainable regulatory framework\
+    \ for Virtual Assets, FSRA is of the view that a comprehensive application of\
+    \ its AML/CFT framework should be in place, including full compliance with, among\
+    \ other things, the:\n\na)\tUAE AML/CFT Federal Laws, including the UAE Cabinet\
+    \ Resolution No. (10) of 2019 Concerning the Executive Regulation of the Federal\
+    \ Law No. 20 of 2018 concerning Anti-Money Laundering and Combating Terrorism\
+    \ Financing;\n\nb)\tUAE Cabinet Resolution 20 of 2019 concerning the procedures\
+    \ of dealing with those listed under the UN sanctions list and UAE/local terrorist\
+    \ lists issued by the Cabinet, including the FSRA AML and Sanctions Rules and\
+    \ Guidance (“AML Rules”) or such other AML rules as may be applicable in ADGM\
+    \ from time to time; and\n\nc)\tadoption of international best practices (including\
+    \ the FATF Recommendations).\n"
+- source_sentence: Are there any ADGM-specific guidelines or best practices for integrating
+    anti-money laundering (AML) compliance into our technology and financial systems
+    to manage operational risks effectively?
+  sentences:
+  - 'REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES
+    IN RELATION TO VIRTUAL ASSETS
+    Security measures and procedures
+    IT infrastructures should be strong enough to resist, without significant loss
+    to Clients, a number of scenarios, including but not limited to: accidental destruction
+    or breach of data, collusion or leakage of information by employees/former employees,
+    successful hack of a cryptographic and hardware security module or server, or
+    access by hackers of any single set of encryption/decryption keys that could result
+    in a complete system breach.
+    '
+  - A Relevant Person may use a database maintained elsewhere for an up-to-date list
+    of resolutions and Sanctions, or to perform checks of customers or transactions
+    against that list. For example, it may wish to use a database maintained by its
+    head office or a Group member. However, the Relevant Person retains responsibility
+    for ensuring that its systems and controls are effective to ensure compliance
+    with this Rulebook.
+  - 'DIGITAL SECURITIES SETTLEMENT
+    Digital Settlement Facilities (DSFs)
+    For the purposes of this Guidance and distinct from RCHs, the FSRA will consider
+    DSFs suitable for the purposes of settlement (MIR Rule 3.8) and custody (MIR Rule
+    2.10) of Digital Securities. A DSF, holding an FSP for Providing Custody, may
+    provide custody and settlement services in Digital Securities for RIEs and MTFs
+    (as applicable).  Therefore, for the purposes of custody and settlement of Digital
+    Securities, the arrangements that a RIE or MTF would normally have in place with
+    a RCH can be replaced with arrangements provided by a DSF, provided that certain
+    requirements, as described in this section, are met.
+    '
+- source_sentence: In the context of the Risk-Based Approach (RBA), how should a Relevant
+    Person prioritize and address the risks once they have been identified and assessed?
+  sentences:
+  - If the Regulator considers that an auditor or actuary has committed a contravention
+    of these Regulations, it may disqualify the auditor or actuary from being the
+    auditor of, or (as the case may be), from acting as an actuary for, any Authorised
+    Person, Recognised Body or Reporting Entity or any particular class thereof.
+  - The Regulator shall have the power to require an Institution in Resolution, or
+    any of its Group Entities, to provide any services or facilities (excluding any
+    financial support) that are necessary to enable the Recipient to operate the transferred
+    business effectively, including where the Institution under Resolution or relevant
+    Group Entity has entered into Insolvency Proceedings.
+  - In addition to assessing risk arising from money laundering, a business risk assessment
+    should assess the potential exposure of a Relevant Person to other Financial Crime,
+    such as fraud and the theft of personal data. The business risk assessment should
+    also address the Relevant Person’s potential exposure to cyber security risk,
+    as this risk may have a material impact on the Relevant Person’s ability to prevent
+    Financial Crime.
+- source_sentence: Can you provide further clarification on the specific measures
+    deemed adequate for handling conflicts of interest related to the provision and
+    management of credit within an Authorised Person's organization?
+  sentences:
+  - An Authorised Person with one or more branches outside the ADGM must implement
+    and maintain Credit Risk policies adapted to each local market and its regulatory
+    conditions.
+  - "In addition, applications for recognition as a Remote Investment Exchange or\
+    \ Remote Clearing House must contain:\n(a)\tthe address of the Applicant's head\
+    \ office in its home jurisdiction;\n(b)\tthe address of a place in the Abu Dhabi\
+    \ Global Market for the service on the Applicant of notices or other documents\
+    \ required or authorised to be served on it;\n(c)\tinformation identifying any\
+    \ type of activity which the Applicant envisages undertaking in the Abu Dhabi\
+    \ Global Market and the extent and nature of usage and membership;\n(d)\ta comparative\
+    \ analysis of the Applicant's regulatory requirements in its home jurisdiction\
+    \ compared against those under the Rules set out in this Rulebook and those contained\
+    \ in the “Principles for Financial Market Infrastructures” issued by IOSCO and\
+    \ the Committee on Payment and Settlement Systems (April 2012);\n(e)\tthe information,\
+    \ evidence and explanatory material necessary to demonstrate to the Regulator\
+    \ that the requirements specified in Rule ‎7.2.2 are met;\n(f)\tone copy of each\
+    \ of the following documents:\n(i)\tits most recent financial statements; and\n\
+    (ii)\tthe Applicant’s memorandum and articles of association or any similar documents;\
+    \ and\n(g)\tthe date by which the Applicant wishes the Recognition Order to take\
+    \ effect."
+  - Financial risk . All applicants are required to demonstrate they have a sound
+    initial capital base and funding and must be able to meet the relevant prudential
+    requirements of ADGM legislation, on an ongoing basis. This includes holding enough
+    capital resources to cover expenses even if expected revenue takes time to materialise.
+    Start-ups can encounter greater financial risks as they seek to establish and
+    grow a new business.
+- source_sentence: What are the recommended best practices for ensuring that all disclosures
+    are prepared in accordance with the PRMS, and how can we validate that our classification
+    and reporting of Petroleum Resources meet the standards set forth?
+  sentences:
+  - Notwithstanding this Rule, an Authorised Person would generally be expected to
+    separate the roles of Compliance Officer and Senior Executive Officer. In addition,
+    the roles of Compliance Officer, Finance Officer and Money Laundering Reporting
+    Officer would not be expected to be combined with any other Controlled Functions
+    unless appropriate monitoring and control arrangements independent of the individual
+    concerned will be implemented by the Authorised Person. This may be possible in
+    the case of a Branch, where monitoring and controlling of the individual (carrying
+    out more than one role in the Branch) is conducted from the Authorised Person's
+    home state by an appropriate individual for each of the relevant Controlled Functions
+    as applicable. However, it is recognised that, on a case by case basis, there
+    may be exceptional circumstances in which this may not always be practical or
+    possible.
+  - 'DISCLOSURE REQUIREMENTS .
+    Material Exploration and drilling results
+    Rule 12.5.1 sets out the reporting requirements relevant to disclosures of material
+    Exploration and drilling results in relation to Petroleum Resources.  Such disclosures
+    should be presented in a factual and balanced manner, and contain sufficient information
+    to allow investors and their advisers to make an informed judgement of its materiality.  Care
+    needs to be taken to ensure that a disclosure does not suggest, without reasonable
+    grounds, that commercially recoverable or potentially recoverable quantities of
+    Petroleum have been discovered, in the absence of determining and disclosing estimates
+    of Petroleum Resources in accordance with Chapter 12 and the PRMS.
+    '
+  - 'REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES
+    IN RELATION TO VIRTUAL ASSETS
+    Origin and destination of Virtual Asset funds
+    Currently, there are technology solutions developed in-house and available from
+    third party service providers which enable the tracking of Virtual Assets through
+    multiple transactions to more accurately identify the source and destination of
+    these Virtual Assets. It is expected that Authorised Persons may need to consider
+    the use of such solutions and other systems to adequately meet their anti-money
+    laundering, financial crime and know-your-customer obligations under the Virtual
+    Asset Framework.
+    '
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# SentenceTransformer based on dunzhang/stella_en_400M_v5
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [dunzhang/stella_en_400M_v5](https://huggingface.co/dunzhang/stella_en_400M_v5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [dunzhang/stella_en_400M_v5](https://huggingface.co/dunzhang/stella_en_400M_v5) <!-- at revision 2aa5579fcae1c579de199a3866b6e514bbbf5d10 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 1024 tokens
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
+  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("BelisaDi/stella-tuned-rirag")
+# Run inference
+sentences = [
+    'What are the recommended best practices for ensuring that all disclosures are prepared in accordance with the PRMS, and how can we validate that our classification and reporting of Petroleum Resources meet the standards set forth?',
+    'DISCLOSURE REQUIREMENTS .\nMaterial Exploration and drilling results\nRule 12.5.1 sets out the reporting requirements relevant to disclosures of material Exploration and drilling results in relation to Petroleum Resources.  Such disclosures should be presented in a factual and balanced manner, and contain sufficient information to allow investors and their advisers to make an informed judgement of its materiality.  Care needs to be taken to ensure that a disclosure does not suggest, without reasonable grounds, that commercially recoverable or potentially recoverable quantities of Petroleum have been discovered, in the absence of determining and disclosing estimates of Petroleum Resources in accordance with Chapter 12 and the PRMS.\n',
+    "Notwithstanding this Rule, an Authorised Person would generally be expected to separate the roles of Compliance Officer and Senior Executive Officer. In addition, the roles of Compliance Officer, Finance Officer and Money Laundering Reporting Officer would not be expected to be combined with any other Controlled Functions unless appropriate monitoring and control arrangements independent of the individual concerned will be implemented by the Authorised Person. This may be possible in the case of a Branch, where monitoring and controlling of the individual (carrying out more than one role in the Branch) is conducted from the Authorised Person's home state by an appropriate individual for each of the relevant Controlled Functions as applicable. However, it is recognised that, on a case by case basis, there may be exceptional circumstances in which this may not always be practical or possible.",
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 1024]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 29,547 training samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                             | positive                                                                             |
+  |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                               |
+  | details | <ul><li>min: 15 tokens</li><li>mean: 34.89 tokens</li><li>max: 96 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 115.67 tokens</li><li>max: 512 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                                                                                                                        | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+  |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>Under Rules 7.3.2 and 7.3.3, what are the two specific conditions related to the maturity of a financial instrument that would trigger a disclosure requirement?</code>                                 | <code>Events that trigger a disclosure. For the purposes of Rules 7.3.2 and 7.3.3, a Person is taken to hold Financial Instruments in or relating to a Reporting Entity, if the Person holds a Financial Instrument that on its maturity will confer on him:<br>(1)	an unconditional right to acquire the Financial Instrument; or<br>(2)	the discretion as to his right to acquire the Financial Instrument.<br></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+  | <code>**Best Execution and Transaction Handling**: What constitutes 'Best Execution' under Rule 6.5 in the context of virtual assets, and how should Authorised Persons document and demonstrate this?</code> | <code>The following COBS Rules should be read as applying to all Transactions undertaken by an Authorised Person conducting a Regulated Activity in relation to Virtual Assets, irrespective of any restrictions on application or any exception to these Rules elsewhere in COBS -<br>(a)	Rule 3.4 (Suitability);<br>(b)	Rule 6.5 (Best Execution);<br>(c)	Rule 6.7 (Aggregation and Allocation);<br>(d)	Rule 6.10 (Confirmation Notes);<br>(e)	Rule 6.11 (Periodic Statements); and<br>(f)	Chapter 12 (Key Information and Client Agreement).</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+  | <code>How does the FSRA define and evaluate "principal risks and uncertainties" for a Petroleum Reporting Entity, particularly for the remaining six months of the financial year?</code>                     | <code>A Reporting Entity must:<br>(a)	prepare such report:<br>(i)	for the first six months of each financial year or period, and if there is a change to the accounting reference date, prepare such report in respect of the period up to the old accounting reference date; and<br>(ii)	in accordance with the applicable IFRS standards or other standards acceptable to the Regulator;<br>(b)	ensure the financial statements have either been audited or reviewed by auditors, and the audit or review by the auditor is included within the report; and<br>(c)	ensure that the report includes:<br>(i)	except in the case of a Mining Exploration Reporting Entity or a Petroleum Exploration Reporting Entity, an indication of important events that have occurred during the first six months of the financial year, and their impact on the financial statements;<br>(ii)	except in the case of a Mining Exploration Reporting Entity or a Petroleum Exploration Reporting Entity, a description of the principal risks and uncertainties for the remaining six months of the financial year; and<br>(iii)	a condensed set of financial statements, an interim management report and associated responsibility statements.</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `learning_rate`: 2e-05
+- `auto_find_batch_size`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: no
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 8
+- `per_device_eval_batch_size`: 8
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 3
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: False
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: True
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch  | Step  | Training Loss |
+|:------:|:-----:|:-------------:|
+| 0.1354 | 500   | 0.3078        |
+| 0.2707 | 1000  | 0.3142        |
+| 0.4061 | 1500  | 0.2546        |
+| 0.5414 | 2000  | 0.2574        |
+| 0.6768 | 2500  | 0.247         |
+| 0.8121 | 3000  | 0.2532        |
+| 0.9475 | 3500  | 0.2321        |
+| 1.0828 | 4000  | 0.1794        |
+| 1.2182 | 4500  | 0.1588        |
+| 1.3535 | 5000  | 0.154         |
+| 1.4889 | 5500  | 0.1592        |
+| 1.6243 | 6000  | 0.1632        |
+| 1.7596 | 6500  | 0.1471        |
+| 1.8950 | 7000  | 0.1669        |
+| 2.0303 | 7500  | 0.1368        |
+| 2.1657 | 8000  | 0.0982        |
+| 2.3010 | 8500  | 0.1125        |
+| 2.4364 | 9000  | 0.089         |
+| 2.5717 | 9500  | 0.0902        |
+| 2.7071 | 10000 | 0.0867        |
+| 2.8424 | 10500 | 0.1017        |
+| 2.9778 | 11000 | 0.0835        |
+### Framework Versions
+- Python: 3.10.12
+- Sentence Transformers: 3.1.1
+- Transformers: 4.45.2
+- PyTorch: 2.5.0+cu124
+- Accelerate: 1.0.1
+- Datasets: 3.0.2
+- Tokenizers: 0.20.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "_name_or_path": "finetuned_models/stella-tuned",
+  "architectures": [
+    "NewModel"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration.NewConfig",
+    "AutoModel": "dunzhang/stella_en_400M_v5--modeling.NewModel"
+  },
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-12,
+  "layer_norm_type": "layer_norm",
+  "logn_attention_clip1": false,
+  "logn_attention_scale": false,
+  "max_position_embeddings": 8192,
+  "model_type": "new",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "pack_qkv": true,
+  "pad_token_id": 0,
+  "position_embedding_type": "rope",
+  "rope_scaling": {
+    "factor": 2.0,
+    "type": "ntk"
+  },
+  "rope_theta": 160000,
+  "torch_dtype": "float32",
+  "transformers_version": "4.45.2",
+  "type_vocab_size": 2,
+  "unpad_inputs": true,
+  "use_memory_efficient_attention": true,
+  "vocab_size": 30528
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.45.2",
+    "pytorch": "2.5.0+cu124"
+  },
+  "prompts": {
+    "s2p_query": "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: ",
+    "s2s_query": "Instruct: Retrieve semantically similar text.\nQuery: "
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

configuration.py ADDED Viewed

	@@ -0,0 +1,145 @@

+# coding=utf-8
+# Copyright 2024 The GTE Team Authors and Alibaba Group.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" NEW model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+class NewConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
+    instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the NEW
+    [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 30522):
+            Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
+        hidden_size (`int`, *optional*, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (`int`, *optional*, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (`int`, *optional*, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
+            `"relu"`, `"silu"` and `"gelu_new"` are supported.
+        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (`int`, *optional*, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (`int`, *optional*, defaults to 2):
+            The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        position_embedding_type (`str`, *optional*, defaults to `"rope"`):
+            Type of position embedding. Choose one of `"absolute"`, `"rope"`.
+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
+            strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
+            `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
+            `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
+            these scaling strategies behave:
+            https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
+            experimental feature, subject to breaking API changes in future versions.
+        classifier_dropout (`float`, *optional*):
+            The dropout ratio for the classification head.
+    Examples:
+    ```python
+    >>> from transformers import NewConfig, NewModel
+    >>> # Initializing a NEW izhx/new-base-en style configuration
+    >>> configuration = NewConfig()
+    >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
+    >>> model = NewModel(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "new"
+    def __init__(
+        self,
+        vocab_size=30528,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.0,
+        max_position_embeddings=2048,
+        type_vocab_size=1,
+        initializer_range=0.02,
+        layer_norm_type='layer_norm',
+        layer_norm_eps=1e-12,
+        # pad_token_id=0,
+        position_embedding_type="rope",
+        rope_theta=10000.0,
+        rope_scaling=None,
+        classifier_dropout=None,
+        pack_qkv=True,
+        unpad_inputs=False,
+        use_memory_efficient_attention=False,
+        logn_attention_scale=False,
+        logn_attention_clip1=False,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.layer_norm_type = layer_norm_type
+        self.layer_norm_eps = layer_norm_eps
+        self.position_embedding_type = position_embedding_type
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+        self.classifier_dropout = classifier_dropout
+        self.pack_qkv = pack_qkv
+        self.unpad_inputs = unpad_inputs
+        self.use_memory_efficient_attention = use_memory_efficient_attention
+        self.logn_attention_scale = logn_attention_scale
+        self.logn_attention_clip1 = logn_attention_clip1

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8fab7a1d00723bbbfda6b8c13335d3a4d82d2c337f18cd479f05dd90e2054c7
+size 1736585680

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Dense",
+    "type": "sentence_transformers.models.Dense"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,62 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 8000,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff