bobox's picture
Training in progress, step 240, checkpoint
9223a91 verified
metadata
base_model: microsoft/deberta-v3-small
datasets:
  - tals/vitaminc
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:225247
  - loss:CachedGISTEmbedLoss
widget:
  - source_sentence: how long to grill boneless skinless chicken breasts in oven
    sentences:
      - "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name. Its pronunciation is AA K AA HHiy â\x80\_. Akahi's origin, as well as its use, is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently used as a baby name for boys."
      - >-
        October consists of 31 days. November has 30 days. When you add both
        together they have 61 days.
      - >-
        Heat a grill or grill pan. When the grill is hot, place the chicken on
        the grill and cook for about 4 minutes per side, or until cooked
        through. You can also bake the thawed chicken in a 375 degree F oven for
        15 minutes, or until cooked through.
  - source_sentence: >-
      More than 273 people have died from the 2019-20 coronavirus outside
      mainland China .
    sentences:
      - >-
        More than 3,700 people have died : around 3,100 in mainland China and
        around 550 in all other countries combined .
      - >-
        More than 3,200 people have died : almost 3,000 in mainland China and
        around 275 in other countries .
      - more than 4,900 deaths have been attributed to COVID-19 .
  - source_sentence: Most red algae species live in oceans.
    sentences:
      - Where do most red algae species live?
      - Which layer of the earth is molten?
      - >-
        As a diver descends, the increase in pressure causes the body’s air
        pockets in the ears and lungs to do what?
  - source_sentence: >-
      Binary compounds of carbon with less electronegative elements are called
      carbides.
    sentences:
      - What are four children born at one birth called?
      - >-
        Binary compounds of carbon with less electronegative elements are called
        what?
      - The water cycle involves movement of water between air and what?
  - source_sentence: What is the basic monetary unit of Iceland?
    sentences:
      - >-
        Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese
        traditional dress Want to watch this again later? Sign in to add this
        video to a playlist. Need to report the video? Sign in to report
        inappropriate content. Rating is available when the video has been
        rented. This feature is not available right now. Please try again later.
        Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant, áo
        dài was designed to praise the slender beauty of Vietnamese women. The
        dress is a genius combination of ancient and modern. It shows every
        curve on the girl's body, creating sexiness for the wearer, yet it still
        preserves the traditional feminine grace of Vietnamese women with its
        charming flowing flaps. The simplicity of áo dài makes it convenient and
        practical, something that other Asian traditional clothes lack. The
        waist-length slits of the flaps allow every movement of the legs:
        walking, running, riding a bicycle, climbing a tree, doing high kicks.
        The looseness of the pants allows comfortability. As a girl walks in áo
        dài, the movements of the flaps make it seem like she's not walking but
        floating in the air. This breath-taking beautiful image of a Vietnamese
        girl walking in áo dài has been an inspiration for generations of
        Vietnamese poets, novelists, artists and has left a deep impression for
        every foreigner who has visited the country. Category
      - >-
        Icelandic monetary unit - definition of Icelandic monetary unit by The
        Free Dictionary Icelandic monetary unit - definition of Icelandic
        monetary unit by The Free Dictionary
        http://www.thefreedictionary.com/Icelandic+monetary+unit Related to
        Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated
        WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona ,
        krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1
        krona in Iceland Want to thank TFD for its existence? Tell a friend
        about us , add a link to this page, or visit the webmaster's page for
        free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
        Disclaimer All content on this website, including dictionary, thesaurus,
        literature, geography, and other reference data is for informational
        purposes only. This information should not be considered complete, up to
        date, and is not intended to be used in place of a visit, consultation,
        or advice of a legal, medical, or any other professional.
      - >-
        Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3,
        Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour,
        present in all plants and algae. Commercially extracted from nettles,
        grass and alfalfa. Function & characteristics:
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.3977846210139704
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.44299644096637864
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.43174431600737306
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.4553695033739603
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.42060129087924125
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.44300328790921845
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.3974381713503513
            name: Pearson Dot
          - type: spearman_dot
            value: 0.4426330607320026
            name: Spearman Dot
          - type: pearson_max
            value: 0.43174431600737306
            name: Pearson Max
          - type: spearman_max
            value: 0.4553695033739603
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.66796875
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9727417230606079
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5338983050847458
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8509687781333923
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.4214046822742475
            name: Cosine Precision
          - type: cosine_recall
            value: 0.7283236994219653
            name: Cosine Recall
          - type: cosine_ap
            value: 0.4443750308487611
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.66796875
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 747.4664916992188
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5347368421052632
            name: Dot F1
          - type: dot_f1_threshold
            value: 652.6121826171875
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.4205298013245033
            name: Dot Precision
          - type: dot_recall
            value: 0.7341040462427746
            name: Dot Recall
          - type: dot_ap
            value: 0.4447331164315086
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.673828125
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 185.35494995117188
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5340909090909091
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 316.48419189453125
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.3971830985915493
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.815028901734104
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.45330636568192945
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.66796875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 6.472302436828613
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5338983050847458
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 15.134000778198242
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.4214046822742475
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.7283236994219653
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.44436910603457025
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.673828125
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 747.4664916992188
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5347368421052632
            name: Max F1
          - type: max_f1_threshold
            value: 652.6121826171875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.4214046822742475
            name: Max Precision
          - type: max_recall
            value: 0.815028901734104
            name: Max Recall
          - type: max_ap
            value: 0.45330636568192945
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.66015625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8744948506355286
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6646433990895295
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.753309965133667
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5177304964539007
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9279661016949152
            name: Cosine Recall
          - type: cosine_ap
            value: 0.6610633478265061
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.66015625
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 670.719970703125
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6646433990895295
            name: Dot F1
          - type: dot_f1_threshold
            value: 578.874755859375
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5177304964539007
            name: Dot Precision
          - type: dot_recall
            value: 0.9279661016949152
            name: Dot Recall
          - type: dot_ap
            value: 0.6607472505349153
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.666015625
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 281.9825134277344
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6678899082568808
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 328.83447265625
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5889967637540453
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.7711864406779662
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6664006509577655
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.66015625
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 13.881525039672852
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6646433990895295
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 19.471359252929688
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.5177304964539007
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.9279661016949152
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.6611053426809266
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.666015625
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 670.719970703125
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6678899082568808
            name: Max F1
          - type: max_f1_threshold
            value: 578.874755859375
            name: Max F1 Threshold
          - type: max_precision
            value: 0.5889967637540453
            name: Max Precision
          - type: max_recall
            value: 0.9279661016949152
            name: Max Recall
          - type: max_ap
            value: 0.6664006509577655
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (linear_cls): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
# Run inference
sentences = [
    'What is the basic monetary unit of Iceland?',
    "Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
    'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.3978
spearman_cosine 0.443
pearson_manhattan 0.4317
spearman_manhattan 0.4554
pearson_euclidean 0.4206
spearman_euclidean 0.443
pearson_dot 0.3974
spearman_dot 0.4426
pearson_max 0.4317
spearman_max 0.4554

Binary Classification

Metric Value
cosine_accuracy 0.668
cosine_accuracy_threshold 0.9727
cosine_f1 0.5339
cosine_f1_threshold 0.851
cosine_precision 0.4214
cosine_recall 0.7283
cosine_ap 0.4444
dot_accuracy 0.668
dot_accuracy_threshold 747.4665
dot_f1 0.5347
dot_f1_threshold 652.6122
dot_precision 0.4205
dot_recall 0.7341
dot_ap 0.4447
manhattan_accuracy 0.6738
manhattan_accuracy_threshold 185.3549
manhattan_f1 0.5341
manhattan_f1_threshold 316.4842
manhattan_precision 0.3972
manhattan_recall 0.815
manhattan_ap 0.4533
euclidean_accuracy 0.668
euclidean_accuracy_threshold 6.4723
euclidean_f1 0.5339
euclidean_f1_threshold 15.134
euclidean_precision 0.4214
euclidean_recall 0.7283
euclidean_ap 0.4444
max_accuracy 0.6738
max_accuracy_threshold 747.4665
max_f1 0.5347
max_f1_threshold 652.6122
max_precision 0.4214
max_recall 0.815
max_ap 0.4533

Binary Classification

Metric Value
cosine_accuracy 0.6602
cosine_accuracy_threshold 0.8745
cosine_f1 0.6646
cosine_f1_threshold 0.7533
cosine_precision 0.5177
cosine_recall 0.928
cosine_ap 0.6611
dot_accuracy 0.6602
dot_accuracy_threshold 670.72
dot_f1 0.6646
dot_f1_threshold 578.8748
dot_precision 0.5177
dot_recall 0.928
dot_ap 0.6607
manhattan_accuracy 0.666
manhattan_accuracy_threshold 281.9825
manhattan_f1 0.6679
manhattan_f1_threshold 328.8345
manhattan_precision 0.589
manhattan_recall 0.7712
manhattan_ap 0.6664
euclidean_accuracy 0.6602
euclidean_accuracy_threshold 13.8815
euclidean_f1 0.6646
euclidean_f1_threshold 19.4714
euclidean_precision 0.5177
euclidean_recall 0.928
euclidean_ap 0.6611
max_accuracy 0.666
max_accuracy_threshold 670.72
max_f1 0.6679
max_f1_threshold 578.8748
max_precision 0.589
max_recall 0.928
max_ap 0.6664

Training Details

Evaluation Dataset

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 128 evaluation samples
  • Columns: claim and evidence
  • Approximate statistics based on the first 128 samples:
    claim evidence
    type string string
    details
    • min: 9 tokens
    • mean: 21.42 tokens
    • max: 41 tokens
    • min: 11 tokens
    • mean: 35.55 tokens
    • max: 79 tokens
  • Samples:
    claim evidence
    Dragon Con had over 5000 guests . Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
    COVID-19 has reached more than 185 countries . As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
    In March , Italy had 3.6x times more cases of coronavirus than China . As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 42
  • per_device_eval_batch_size: 128
  • gradient_accumulation_steps: 2
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1e-05}
  • warmup_ratio: 0.25
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 42
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1e-05}
  • warmup_ratio: 0.25
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss vitaminc-pairs loss negation-triplets loss scitail-pairs-pos loss scitail-pairs-qa loss xsum-pairs loss sciq pairs loss qasc pairs loss openbookqa pairs loss msmarco pairs loss nq pairs loss trivia pairs loss gooaq pairs loss paws-pos loss global dataset loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0009 1 5.8564 - - - - - - - - - - - - - - - - -
0.0018 2 7.1716 - - - - - - - - - - - - - - - - -
0.0027 3 5.9095 - - - - - - - - - - - - - - - - -
0.0035 4 5.0841 - - - - - - - - - - - - - - - - -
0.0044 5 4.0184 - - - - - - - - - - - - - - - - -
0.0053 6 6.2191 - - - - - - - - - - - - - - - - -
0.0062 7 5.6124 - - - - - - - - - - - - - - - - -
0.0071 8 3.9544 - - - - - - - - - - - - - - - - -
0.0080 9 4.7149 - - - - - - - - - - - - - - - - -
0.0088 10 4.9616 - - - - - - - - - - - - - - - - -
0.0097 11 5.2794 - - - - - - - - - - - - - - - - -
0.0106 12 8.8704 - - - - - - - - - - - - - - - - -
0.0115 13 6.0707 - - - - - - - - - - - - - - - - -
0.0124 14 5.4071 - - - - - - - - - - - - - - - - -
0.0133 15 6.9104 - - - - - - - - - - - - - - - - -
0.0142 16 6.0276 - - - - - - - - - - - - - - - - -
0.0150 17 6.737 - - - - - - - - - - - - - - - - -
0.0159 18 6.5354 - - - - - - - - - - - - - - - - -
0.0168 19 5.206 - - - - - - - - - - - - - - - - -
0.0177 20 5.2469 - - - - - - - - - - - - - - - - -
0.0186 21 5.3771 - - - - - - - - - - - - - - - - -
0.0195 22 4.979 - - - - - - - - - - - - - - - - -
0.0204 23 4.7909 - - - - - - - - - - - - - - - - -
0.0212 24 4.9086 - - - - - - - - - - - - - - - - -
0.0221 25 4.8826 - - - - - - - - - - - - - - - - -
0.0230 26 8.2266 - - - - - - - - - - - - - - - - -
0.0239 27 8.3024 - - - - - - - - - - - - - - - - -
0.0248 28 5.8745 - - - - - - - - - - - - - - - - -
0.0257 29 4.7298 - - - - - - - - - - - - - - - - -
0.0265 30 5.4614 - - - - - - - - - - - - - - - - -
0.0274 31 5.8594 - - - - - - - - - - - - - - - - -
0.0283 32 5.2401 - - - - - - - - - - - - - - - - -
0.0292 33 5.1579 - - - - - - - - - - - - - - - - -
0.0301 34 5.2181 - - - - - - - - - - - - - - - - -
0.0310 35 4.6328 - - - - - - - - - - - - - - - - -
0.0319 36 2.121 - - - - - - - - - - - - - - - - -
0.0327 37 5.9026 - - - - - - - - - - - - - - - - -
0.0336 38 7.3796 - - - - - - - - - - - - - - - - -
0.0345 39 5.5361 - - - - - - - - - - - - - - - - -
0.0354 40 4.0243 2.9018 5.6903 2.1136 2.8052 6.5831 0.8882 4.1148 5.0966 10.3911 10.9032 7.1904 8.1935 1.3943 5.6716 0.1879 0.3385 0.5781
0.0363 41 4.9072 - - - - - - - - - - - - - - - - -
0.0372 42 3.4439 - - - - - - - - - - - - - - - - -
0.0381 43 4.9787 - - - - - - - - - - - - - - - - -
0.0389 44 5.8318 - - - - - - - - - - - - - - - - -
0.0398 45 5.3226 - - - - - - - - - - - - - - - - -
0.0407 46 5.1181 - - - - - - - - - - - - - - - - -
0.0416 47 4.7834 - - - - - - - - - - - - - - - - -
0.0425 48 6.6303 - - - - - - - - - - - - - - - - -
0.0434 49 5.8171 - - - - - - - - - - - - - - - - -
0.0442 50 5.1962 - - - - - - - - - - - - - - - - -
0.0451 51 5.2096 - - - - - - - - - - - - - - - - -
0.0460 52 5.0943 - - - - - - - - - - - - - - - - -
0.0469 53 4.9038 - - - - - - - - - - - - - - - - -
0.0478 54 4.6479 - - - - - - - - - - - - - - - - -
0.0487 55 5.5098 - - - - - - - - - - - - - - - - -
0.0496 56 4.6979 - - - - - - - - - - - - - - - - -
0.0504 57 3.1969 - - - - - - - - - - - - - - - - -
0.0513 58 4.4127 - - - - - - - - - - - - - - - - -
0.0522 59 3.7746 - - - - - - - - - - - - - - - - -
0.0531 60 4.5378 - - - - - - - - - - - - - - - - -
0.0540 61 5.0209 - - - - - - - - - - - - - - - - -
0.0549 62 6.5936 - - - - - - - - - - - - - - - - -
0.0558 63 4.2315 - - - - - - - - - - - - - - - - -
0.0566 64 6.4269 - - - - - - - - - - - - - - - - -
0.0575 65 4.2644 - - - - - - - - - - - - - - - - -
0.0584 66 5.1388 - - - - - - - - - - - - - - - - -
0.0593 67 5.1852 - - - - - - - - - - - - - - - - -
0.0602 68 4.8057 - - - - - - - - - - - - - - - - -
0.0611 69 3.1725 - - - - - - - - - - - - - - - - -
0.0619 70 3.3322 - - - - - - - - - - - - - - - - -
0.0628 71 5.139 - - - - - - - - - - - - - - - - -
0.0637 72 4.307 - - - - - - - - - - - - - - - - -
0.0646 73 5.0133 - - - - - - - - - - - - - - - - -
0.0655 74 4.0507 - - - - - - - - - - - - - - - - -
0.0664 75 3.3895 - - - - - - - - - - - - - - - - -
0.0673 76 5.6736 - - - - - - - - - - - - - - - - -
0.0681 77 4.2572 - - - - - - - - - - - - - - - - -
0.0690 78 3.0796 - - - - - - - - - - - - - - - - -
0.0699 79 5.0199 - - - - - - - - - - - - - - - - -
0.0708 80 4.1414 2.7794 4.8890 1.8997 2.6761 6.2096 0.7622 3.3129 4.5498 7.2056 7.6809 6.3792 6.6567 1.3848 5.0030 0.2480 0.3513 0.5898
0.0717 81 5.8604 - - - - - - - - - - - - - - - - -
0.0726 82 4.3003 - - - - - - - - - - - - - - - - -
0.0735 83 4.4568 - - - - - - - - - - - - - - - - -
0.0743 84 4.2747 - - - - - - - - - - - - - - - - -
0.0752 85 5.52 - - - - - - - - - - - - - - - - -
0.0761 86 2.7767 - - - - - - - - - - - - - - - - -
0.0770 87 4.397 - - - - - - - - - - - - - - - - -
0.0779 88 5.4449 - - - - - - - - - - - - - - - - -
0.0788 89 4.2706 - - - - - - - - - - - - - - - - -
0.0796 90 6.4759 - - - - - - - - - - - - - - - - -
0.0805 91 4.1951 - - - - - - - - - - - - - - - - -
0.0814 92 4.6328 - - - - - - - - - - - - - - - - -
0.0823 93 4.1278 - - - - - - - - - - - - - - - - -
0.0832 94 4.1787 - - - - - - - - - - - - - - - - -
0.0841 95 5.2156 - - - - - - - - - - - - - - - - -
0.0850 96 3.1403 - - - - - - - - - - - - - - - - -
0.0858 97 4.0273 - - - - - - - - - - - - - - - - -
0.0867 98 3.0624 - - - - - - - - - - - - - - - - -
0.0876 99 4.6786 - - - - - - - - - - - - - - - - -
0.0885 100 4.1505 - - - - - - - - - - - - - - - - -
0.0894 101 2.9529 - - - - - - - - - - - - - - - - -
0.0903 102 4.7048 - - - - - - - - - - - - - - - - -
0.0912 103 4.7388 - - - - - - - - - - - - - - - - -
0.0920 104 3.7879 - - - - - - - - - - - - - - - - -
0.0929 105 4.0311 - - - - - - - - - - - - - - - - -
0.0938 106 4.1314 - - - - - - - - - - - - - - - - -
0.0947 107 4.9411 - - - - - - - - - - - - - - - - -
0.0956 108 4.1118 - - - - - - - - - - - - - - - - -
0.0965 109 3.6971 - - - - - - - - - - - - - - - - -
0.0973 110 5.605 - - - - - - - - - - - - - - - - -
0.0982 111 3.4563 - - - - - - - - - - - - - - - - -
0.0991 112 3.7422 - - - - - - - - - - - - - - - - -
0.1 113 3.8055 - - - - - - - - - - - - - - - - -
0.1009 114 5.2369 - - - - - - - - - - - - - - - - -
0.1018 115 5.6518 - - - - - - - - - - - - - - - - -
0.1027 116 3.2906 - - - - - - - - - - - - - - - - -
0.1035 117 3.4996 - - - - - - - - - - - - - - - - -
0.1044 118 3.6283 - - - - - - - - - - - - - - - - -
0.1053 119 4.1487 - - - - - - - - - - - - - - - - -
0.1062 120 4.3996 2.7279 4.3946 1.4130 2.1150 6.0486 0.7172 2.9669 4.4180 6.3022 6.8412 6.2013 6.0982 0.9474 4.3852 0.3149 0.3693 0.5975
0.1071 121 3.5291 - - - - - - - - - - - - - - - - -
0.1080 122 3.8232 - - - - - - - - - - - - - - - - -
0.1088 123 4.6035 - - - - - - - - - - - - - - - - -
0.1097 124 3.7607 - - - - - - - - - - - - - - - - -
0.1106 125 3.8461 - - - - - - - - - - - - - - - - -
0.1115 126 3.3413 - - - - - - - - - - - - - - - - -
0.1124 127 4.2777 - - - - - - - - - - - - - - - - -
0.1133 128 4.3597 - - - - - - - - - - - - - - - - -
0.1142 129 3.9046 - - - - - - - - - - - - - - - - -
0.1150 130 4.0527 - - - - - - - - - - - - - - - - -
0.1159 131 5.0883 - - - - - - - - - - - - - - - - -
0.1168 132 3.8308 - - - - - - - - - - - - - - - - -
0.1177 133 3.572 - - - - - - - - - - - - - - - - -
0.1186 134 3.4299 - - - - - - - - - - - - - - - - -
0.1195 135 4.1541 - - - - - - - - - - - - - - - - -
0.1204 136 3.584 - - - - - - - - - - - - - - - - -
0.1212 137 5.0977 - - - - - - - - - - - - - - - - -
0.1221 138 4.6769 - - - - - - - - - - - - - - - - -
0.1230 139 3.8396 - - - - - - - - - - - - - - - - -
0.1239 140 3.2875 - - - - - - - - - - - - - - - - -
0.1248 141 4.1946 - - - - - - - - - - - - - - - - -
0.1257 142 4.9602 - - - - - - - - - - - - - - - - -
0.1265 143 4.1531 - - - - - - - - - - - - - - - - -
0.1274 144 3.8351 - - - - - - - - - - - - - - - - -
0.1283 145 3.112 - - - - - - - - - - - - - - - - -
0.1292 146 2.3145 - - - - - - - - - - - - - - - - -
0.1301 147 4.0989 - - - - - - - - - - - - - - - - -
0.1310 148 3.2173 - - - - - - - - - - - - - - - - -
0.1319 149 2.7913 - - - - - - - - - - - - - - - - -
0.1327 150 3.7627 - - - - - - - - - - - - - - - - -
0.1336 151 3.3669 - - - - - - - - - - - - - - - - -
0.1345 152 2.6775 - - - - - - - - - - - - - - - - -
0.1354 153 3.2804 - - - - - - - - - - - - - - - - -
0.1363 154 3.0676 - - - - - - - - - - - - - - - - -
0.1372 155 3.1559 - - - - - - - - - - - - - - - - -
0.1381 156 2.6638 - - - - - - - - - - - - - - - - -
0.1389 157 2.8045 - - - - - - - - - - - - - - - - -
0.1398 158 4.0568 - - - - - - - - - - - - - - - - -
0.1407 159 2.7554 - - - - - - - - - - - - - - - - -
0.1416 160 3.7407 2.7439 4.6364 1.0089 1.1229 5.4870 0.6284 2.5933 4.3943 5.6565 5.9870 5.6944 5.3857 0.3622 3.4011 0.3141 0.3898 0.6417
0.1425 161 3.4324 - - - - - - - - - - - - - - - - -
0.1434 162 3.6658 - - - - - - - - - - - - - - - - -
0.1442 163 3.96 - - - - - - - - - - - - - - - - -
0.1451 164 2.3167 - - - - - - - - - - - - - - - - -
0.1460 165 3.6345 - - - - - - - - - - - - - - - - -
0.1469 166 2.462 - - - - - - - - - - - - - - - - -
0.1478 167 1.4742 - - - - - - - - - - - - - - - - -
0.1487 168 4.7312 - - - - - - - - - - - - - - - - -
0.1496 169 2.6785 - - - - - - - - - - - - - - - - -
0.1504 170 3.449 - - - - - - - - - - - - - - - - -
0.1513 171 2.437 - - - - - - - - - - - - - - - - -
0.1522 172 4.2431 - - - - - - - - - - - - - - - - -
0.1531 173 4.4848 - - - - - - - - - - - - - - - - -
0.1540 174 2.5575 - - - - - - - - - - - - - - - - -
0.1549 175 2.3798 - - - - - - - - - - - - - - - - -
0.1558 176 4.4939 - - - - - - - - - - - - - - - - -
0.1566 177 4.1285 - - - - - - - - - - - - - - - - -
0.1575 178 3.0096 - - - - - - - - - - - - - - - - -
0.1584 179 4.4431 - - - - - - - - - - - - - - - - -
0.1593 180 3.1172 - - - - - - - - - - - - - - - - -
0.1602 181 2.3576 - - - - - - - - - - - - - - - - -
0.1611 182 3.7849 - - - - - - - - - - - - - - - - -
0.1619 183 3.679 - - - - - - - - - - - - - - - - -
0.1628 184 3.1949 - - - - - - - - - - - - - - - - -
0.1637 185 3.2422 - - - - - - - - - - - - - - - - -
0.1646 186 2.9905 - - - - - - - - - - - - - - - - -
0.1655 187 2.2697 - - - - - - - - - - - - - - - - -
0.1664 188 1.7685 - - - - - - - - - - - - - - - - -
0.1673 189 2.0971 - - - - - - - - - - - - - - - - -
0.1681 190 3.4689 - - - - - - - - - - - - - - - - -
0.1690 191 1.6614 - - - - - - - - - - - - - - - - -
0.1699 192 1.9574 - - - - - - - - - - - - - - - - -
0.1708 193 1.9313 - - - - - - - - - - - - - - - - -
0.1717 194 2.2316 - - - - - - - - - - - - - - - - -
0.1726 195 1.9854 - - - - - - - - - - - - - - - - -
0.1735 196 2.8428 - - - - - - - - - - - - - - - - -
0.1743 197 2.6916 - - - - - - - - - - - - - - - - -
0.1752 198 3.5193 - - - - - - - - - - - - - - - - -
0.1761 199 3.1681 - - - - - - - - - - - - - - - - -
0.1770 200 2.7377 2.7042 4.8735 0.6428 0.6248 4.3639 0.4776 1.8950 3.3982 4.1048 4.7591 4.4568 4.1613 0.1802 2.4959 0.3521 0.4227 0.6702
0.1779 201 1.6408 - - - - - - - - - - - - - - - - -
0.1788 202 2.3864 - - - - - - - - - - - - - - - - -
0.1796 203 2.0848 - - - - - - - - - - - - - - - - -
0.1805 204 2.9074 - - - - - - - - - - - - - - - - -
0.1814 205 2.542 - - - - - - - - - - - - - - - - -
0.1823 206 1.7312 - - - - - - - - - - - - - - - - -
0.1832 207 1.6768 - - - - - - - - - - - - - - - - -
0.1841 208 2.531 - - - - - - - - - - - - - - - - -
0.1850 209 2.9222 - - - - - - - - - - - - - - - - -
0.1858 210 2.4152 - - - - - - - - - - - - - - - - -
0.1867 211 1.4345 - - - - - - - - - - - - - - - - -
0.1876 212 1.5864 - - - - - - - - - - - - - - - - -
0.1885 213 1.272 - - - - - - - - - - - - - - - - -
0.1894 214 1.7011 - - - - - - - - - - - - - - - - -
0.1903 215 3.0076 - - - - - - - - - - - - - - - - -
0.1912 216 2.468 - - - - - - - - - - - - - - - - -
0.1920 217 2.0796 - - - - - - - - - - - - - - - - -
0.1929 218 2.9735 - - - - - - - - - - - - - - - - -
0.1938 219 2.5506 - - - - - - - - - - - - - - - - -
0.1947 220 1.7307 - - - - - - - - - - - - - - - - -
0.1956 221 1.4519 - - - - - - - - - - - - - - - - -
0.1965 222 1.7292 - - - - - - - - - - - - - - - - -
0.1973 223 1.4664 - - - - - - - - - - - - - - - - -
0.1982 224 1.6201 - - - - - - - - - - - - - - - - -
0.1991 225 2.3483 - - - - - - - - - - - - - - - - -
0.2 226 2.1311 - - - - - - - - - - - - - - - - -
0.2009 227 2.3272 - - - - - - - - - - - - - - - - -
0.2018 228 2.6164 - - - - - - - - - - - - - - - - -
0.2027 229 1.6261 - - - - - - - - - - - - - - - - -
0.2035 230 2.5293 - - - - - - - - - - - - - - - - -
0.2044 231 1.2885 - - - - - - - - - - - - - - - - -
0.2053 232 2.0039 - - - - - - - - - - - - - - - - -
0.2062 233 3.0003 - - - - - - - - - - - - - - - - -
0.2071 234 2.0491 - - - - - - - - - - - - - - - - -
0.2080 235 2.0178 - - - - - - - - - - - - - - - - -
0.2088 236 1.8532 - - - - - - - - - - - - - - - - -
0.2097 237 2.3614 - - - - - - - - - - - - - - - - -
0.2106 238 1.1889 - - - - - - - - - - - - - - - - -
0.2115 239 1.4833 - - - - - - - - - - - - - - - - -
0.2124 240 2.8687 2.7215 4.1544 0.4166 0.3876 3.3157 0.3711 1.4818 2.6939 3.2454 3.9798 3.5949 3.2266 0.1275 1.8867 0.4430 0.4533 0.6664

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.2.0
  • Transformers: 4.45.1
  • PyTorch: 2.4.0
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}