Edit model card

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("danicafisher/dfisher-sentence-transformer-fine-tuned")
# Run inference
sentences = [
    'What methods are suggested for recording and integrating structured feedback about content provenance from various stakeholders in the context of GAI systems?',
    "39 \nMS-3.3-004 \nProvide input for training materials about the capabilities and limitations of GAI \nsystems related to digital content transparency for AI Actors, other \nprofessionals, and the public about the societal impacts of AI and the role of \ndiverse and inclusive content generation. \nHuman-AI Configuration; \nInformation Integrity; Harmful Bias \nand Homogenization \nMS-3.3-005 \nRecord and integrate structured feedback about content provenance from \noperators, users, and potentially impacted communities through the use of \nmethods such as user research studies, focus groups, or community forums. \nActively seek feedback on generated content quality and potential biases. \nAssess the general awareness among end users and impacted communities \nabout the availability of these feedback channels. \nHuman-AI Configuration; \nInformation Integrity; Harmful Bias \nand Homogenization \nAI Actor Tasks: AI Deployment, Affected Individuals and Communities, End-Users, Operation and Monitoring, TEVV \n \nMEASURE 4.2: Measurement results regarding AI system trustworthiness in deployment context(s) and across the AI lifecycle are \ninformed by input from domain experts and relevant AI Actors to validate whether the system is performing consistently as \nintended. Results are documented. \nAction ID \nSuggested Action \nGAI Risks \nMS-4.2-001 \nConduct adversarial testing at a regular cadence to map and measure GAI risks, \nincluding tests to address attempts to deceive or manipulate the application of \nprovenance techniques or other misuses. Identify vulnerabilities and \nunderstand potential misuse scenarios and unintended outputs. \nInformation Integrity; Information \nSecurity \nMS-4.2-002 \nEvaluate GAI system performance in real-world scenarios to observe its \nbehavior in practical environments and reveal issues that might not surface in \ncontrolled and optimized testing environments. \nHuman-AI Configuration; \nConfabulation; Information \nSecurity \nMS-4.2-003 \nImplement interpretability and explainability methods to evaluate GAI system \ndecisions and verify alignment with intended purpose. \nInformation Integrity; Harmful Bias \nand Homogenization \nMS-4.2-004 \nMonitor and document instances where human operators or other systems \noverride the GAI's decisions. Evaluate these cases to understand if the overrides \nare linked to issues related to content provenance. \nInformation Integrity \nMS-4.2-005 \nVerify and document the incorporation of results of structured public feedback \nexercises into design, implementation, deployment approval (“go”/“no-go” \ndecisions), monitoring, and decommission decisions. \nHuman-AI Configuration; \nInformation Security \nAI Actor Tasks: AI Deployment, Domain Experts, End-Users, Operation and Monitoring, TEVV",
    '46 \nMG-4.3-003 \nReport GAI incidents in compliance with legal and regulatory requirements (e.g., \nHIPAA breach reporting, e.g., OCR (2023) or NHTSA (2022) autonomous vehicle \ncrash reporting requirements. \nInformation Security; Data Privacy \nAI Actor Tasks: AI Deployment, Affected Individuals and Communities, Domain Experts, End-Users, Human Factors, Operation and \nMonitoring',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 274 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 274 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 12 tokens
    • mean: 22.67 tokens
    • max: 38 tokens
    • min: 21 tokens
    • mean: 245.27 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    How does the Executive Order on Advancing Racial Equity define 'equity' and 'underserved communities'? ENDNOTES
    47. Darshali A. Vyas et al., Hidden in Plain Sight – Reconsidering the Use of Race Correction in Clinical
    Algorithms, 383 N. Engl. J. Med.874, 876-78 (Aug. 27, 2020), https://www.nejm.org/doi/full/10.1056/
    NEJMms2004740.
    48. The definitions of 'equity' and 'underserved communities' can be found in the Definitions section of
    this framework as well as in Section 2 of The Executive Order On Advancing Racial Equity and Support
    for Underserved Communities Through the Federal Government. https://www.whitehouse.gov/
    briefing-room/presidential-actions/2021/01/20/executive-order-advancing-racial-equity-and-support­
    for-underserved-communities-through-the-federal-government/
    49. Id.
    50. Various organizations have offered proposals for how such assessments might be designed. See, e.g.,
    Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf.
    Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society
    Research Institute Report. June 29, 2021. https://datasociety.net/library/assembling-accountability­
    algorithmic-impact-assessment-for-the-public-interest/; Nicol Turner Lee, Paul Resnick, and Genie
    Barton. Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms.
    Brookings Report. May 22, 2019.
    https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and­
    policies-to-reduce-consumer-harms/; Andrew D. Selbst. An Institutional View Of Algorithmic Impact
    Assessments. Harvard Journal of Law & Technology. June 15, 2021. https://ssrn.com/abstract=3867634;
    Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. Algorithmic Impact
    Assessments: A Practical Framework for Public Agency Accountability. AI Now Institute Report. April
    2018. https://ainowinstitute.org/aiareport2018.pdf
    51. Department of Justice. Justice Department Announces New Initiative to Combat Redlining. Oct. 22,
    2021. https://www.justice.gov/opa/pr/justice-department-announces-new-initiative-combat-redlining
    52. PAVE Interagency Task Force on Property Appraisal and Valuation Equity. Action Plan to Advance
    Property Appraisal and Valuation Equity: Closing the Racial Wealth Gap by Addressing Mis-valuations for
    Families and Communities of Color. March 2022. https://pave.hud.gov/sites/pave.hud.gov/files/
    documents/PAVEActionPlan.pdf
    53. U.S. Equal Employment Opportunity Commission. The Americans with Disabilities Act and the Use of
    Software, Algorithms, and Artificial Intelligence to Assess Job Applicants and Employees. EEOC­
    NVTA-2022-2. May 12, 2022. https://www.eeoc.gov/laws/guidance/americans-disabilities-act-and-use­
    software-algorithms-and-artificial-intelligence; U.S. Department of Justice. Algorithms, Artificial
    Intelligence, and Disability Discrimination in Hiring. May 12, 2022. https://beta.ada.gov/resources/ai­
    guidance/
    54. Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in
    an algorithm used to manage the health of populations. Science. Vol. 366, No. 6464. Oct. 25, 2019. https://
    www.science.org/doi/10.1126/science.aax2342
    55. Data & Trust Alliance. Algorithmic Bias Safeguards for Workforce: Overview. Jan. 2022. https://
    dataandtrustalliance.org/Algorithmic_Bias_Safeguards_for_Workforce_Overview.pdf
    56. Section 508.gov. IT Accessibility Laws and Policies. Access Board. https://www.section508.gov/
    manage/laws-and-policies/
    67
    What are the key expectations for automated systems as outlined in the context? HUMAN ALTERNATIVES,
    CONSIDERATION, AND
    FALLBACK
    WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
    The expectations for automated systems are meant to serve as a blueprint for the development of additional
    technical standards and practices that are tailored for particular sectors and contexts.
    Equitable. Consideration should be given to ensuring outcomes of the fallback and escalation system are
    equitable when compared to those of the automated system and such that the fallback and escalation
    system provides equitable access to underserved communities.105
    Timely. Human consideration and fallback are only useful if they are conducted and concluded in a
    timely manner. The determination of what is timely should be made relative to the specific automated
    system, and the review system should be staffed and regularly assessed to ensure it is providing timely
    consideration and fallback. In time-critical systems, this mechanism should be immediately available or,
    where possible, available before the harm occurs. Time-critical systems include, but are not limited to,
    voting-related systems, automated building access and other access systems, systems that form a critical
    component of healthcare, and systems that have the ability to withhold wages or otherwise cause
    immediate financial penalties.
    Effective. The organizational structure surrounding processes for consideration and fallback should
    be designed so that if the human decision-maker charged with reassessing a decision determines that it
    should be overruled, the new decision will be effectively enacted. This includes ensuring that the new
    decision is entered into the automated system throughout its components, any previous repercussions from
    the old decision are also overturned, and safeguards are put in place to help ensure that future decisions do
    not result in the same errors.
    Maintained. The human consideration and fallback process and any associated automated processes
    should be maintained and supported as long as the relevant automated system continues to be in use.
    Institute training, assessment, and oversight to combat automation bias and ensure any
    human-based components of a system are effective.
    Training and assessment. Anyone administering, interacting with, or interpreting the outputs of an auto­
    mated system should receive training in that system, including how to properly interpret outputs of a system
    in light of its intended purpose and in how to mitigate the effects of automation bias. The training should reoc­
    cur regularly to ensure it is up to date with the system and to ensure the system is used appropriately. Assess­
    ment should be ongoing to ensure that the use of the system with human involvement provides for appropri­
    ate results, i.e., that the involvement of people does not invalidate the system's assessment as safe and effective
    or lead to algorithmic discrimination.
    Oversight. Human-based systems have the potential for bias, including automation bias, as well as other
    concerns that may limit their effectiveness. The results of assessments of the efficacy and potential bias of
    such human-based systems should be overseen by governance structures that have the potential to update the
    operation of the human-based system in order to mitigate these effects.
    50
    What is the focus of the report titled "Assembling Accountability: Algorithmic Impact Assessment for the Public Interest" by Emanuel Moss and others? ENDNOTES
    47. Darshali A. Vyas et al., Hidden in Plain Sight – Reconsidering the Use of Race Correction in Clinical
    Algorithms, 383 N. Engl. J. Med.874, 876-78 (Aug. 27, 2020), https://www.nejm.org/doi/full/10.1056/
    NEJMms2004740.
    48. The definitions of 'equity' and 'underserved communities' can be found in the Definitions section of
    this framework as well as in Section 2 of The Executive Order On Advancing Racial Equity and Support
    for Underserved Communities Through the Federal Government. https://www.whitehouse.gov/
    briefing-room/presidential-actions/2021/01/20/executive-order-advancing-racial-equity-and-support­
    for-underserved-communities-through-the-federal-government/
    49. Id.
    50. Various organizations have offered proposals for how such assessments might be designed. See, e.g.,
    Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf.
    Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society
    Research Institute Report. June 29, 2021. https://datasociety.net/library/assembling-accountability­
    algorithmic-impact-assessment-for-the-public-interest/; Nicol Turner Lee, Paul Resnick, and Genie
    Barton. Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms.
    Brookings Report. May 22, 2019.
    https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and­
    policies-to-reduce-consumer-harms/; Andrew D. Selbst. An Institutional View Of Algorithmic Impact
    Assessments. Harvard Journal of Law & Technology. June 15, 2021. https://ssrn.com/abstract=3867634;
    Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. Algorithmic Impact
    Assessments: A Practical Framework for Public Agency Accountability. AI Now Institute Report. April
    2018. https://ainowinstitute.org/aiareport2018.pdf
    51. Department of Justice. Justice Department Announces New Initiative to Combat Redlining. Oct. 22,
    2021. https://www.justice.gov/opa/pr/justice-department-announces-new-initiative-combat-redlining
    52. PAVE Interagency Task Force on Property Appraisal and Valuation Equity. Action Plan to Advance
    Property Appraisal and Valuation Equity: Closing the Racial Wealth Gap by Addressing Mis-valuations for
    Families and Communities of Color. March 2022. https://pave.hud.gov/sites/pave.hud.gov/files/
    documents/PAVEActionPlan.pdf
    53. U.S. Equal Employment Opportunity Commission. The Americans with Disabilities Act and the Use of
    Software, Algorithms, and Artificial Intelligence to Assess Job Applicants and Employees. EEOC­
    NVTA-2022-2. May 12, 2022. https://www.eeoc.gov/laws/guidance/americans-disabilities-act-and-use­
    software-algorithms-and-artificial-intelligence; U.S. Department of Justice. Algorithms, Artificial
    Intelligence, and Disability Discrimination in Hiring. May 12, 2022. https://beta.ada.gov/resources/ai­
    guidance/
    54. Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in
    an algorithm used to manage the health of populations. Science. Vol. 366, No. 6464. Oct. 25, 2019. https://
    www.science.org/doi/10.1126/science.aax2342
    55. Data & Trust Alliance. Algorithmic Bias Safeguards for Workforce: Overview. Jan. 2022. https://
    dataandtrustalliance.org/Algorithmic_Bias_Safeguards_for_Workforce_Overview.pdf
    56. Section 508.gov. IT Accessibility Laws and Policies. Access Board. https://www.section508.gov/
    manage/laws-and-policies/
    67
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
15
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for danicafisher/dfisher-sentence-transformer-fine-tuned

Finetuned
(140)
this model