metadata
base_model: microsoft/deberta-v3-small
datasets:
- tals/vitaminc
language:
- en
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- dot_accuracy
- dot_accuracy_threshold
- dot_f1
- dot_f1_threshold
- dot_precision
- dot_recall
- dot_ap
- manhattan_accuracy
- manhattan_accuracy_threshold
- manhattan_f1
- manhattan_f1_threshold
- manhattan_precision
- manhattan_recall
- manhattan_ap
- euclidean_accuracy
- euclidean_accuracy_threshold
- euclidean_f1
- euclidean_f1_threshold
- euclidean_precision
- euclidean_recall
- euclidean_ap
- max_accuracy
- max_accuracy_threshold
- max_f1
- max_f1_threshold
- max_precision
- max_recall
- max_ap
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:225247
- loss:CachedGISTEmbedLoss
widget:
- source_sentence: how long to grill boneless skinless chicken breasts in oven
sentences:
- "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name. Its pronunciation is AA K AA HHiy â\x80\_. Akahi's origin, as well as its use, is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently used as a baby name for boys."
- >-
October consists of 31 days. November has 30 days. When you add both
together they have 61 days.
- >-
Heat a grill or grill pan. When the grill is hot, place the chicken on
the grill and cook for about 4 minutes per side, or until cooked
through. You can also bake the thawed chicken in a 375 degree F oven for
15 minutes, or until cooked through.
- source_sentence: >-
More than 273 people have died from the 2019-20 coronavirus outside
mainland China .
sentences:
- >-
More than 3,700 people have died : around 3,100 in mainland China and
around 550 in all other countries combined .
- >-
More than 3,200 people have died : almost 3,000 in mainland China and
around 275 in other countries .
- more than 4,900 deaths have been attributed to COVID-19 .
- source_sentence: Most red algae species live in oceans.
sentences:
- Where do most red algae species live?
- Which layer of the earth is molten?
- >-
As a diver descends, the increase in pressure causes the body’s air
pockets in the ears and lungs to do what?
- source_sentence: >-
Binary compounds of carbon with less electronegative elements are called
carbides.
sentences:
- What are four children born at one birth called?
- >-
Binary compounds of carbon with less electronegative elements are called
what?
- The water cycle involves movement of water between air and what?
- source_sentence: What is the basic monetary unit of Iceland?
sentences:
- >-
Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese
traditional dress Want to watch this again later? Sign in to add this
video to a playlist. Need to report the video? Sign in to report
inappropriate content. Rating is available when the video has been
rented. This feature is not available right now. Please try again later.
Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant, áo
dài was designed to praise the slender beauty of Vietnamese women. The
dress is a genius combination of ancient and modern. It shows every
curve on the girl's body, creating sexiness for the wearer, yet it still
preserves the traditional feminine grace of Vietnamese women with its
charming flowing flaps. The simplicity of áo dài makes it convenient and
practical, something that other Asian traditional clothes lack. The
waist-length slits of the flaps allow every movement of the legs:
walking, running, riding a bicycle, climbing a tree, doing high kicks.
The looseness of the pants allows comfortability. As a girl walks in áo
dài, the movements of the flaps make it seem like she's not walking but
floating in the air. This breath-taking beautiful image of a Vietnamese
girl walking in áo dài has been an inspiration for generations of
Vietnamese poets, novelists, artists and has left a deep impression for
every foreigner who has visited the country. Category
- >-
Icelandic monetary unit - definition of Icelandic monetary unit by The
Free Dictionary Icelandic monetary unit - definition of Icelandic
monetary unit by The Free Dictionary
http://www.thefreedictionary.com/Icelandic+monetary+unit Related to
Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated
WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona ,
krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1
krona in Iceland Want to thank TFD for its existence? Tell a friend
about us , add a link to this page, or visit the webmaster's page for
free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
Disclaimer All content on this website, including dictionary, thesaurus,
literature, geography, and other reference data is for informational
purposes only. This information should not be considered complete, up to
date, and is not intended to be used in place of a visit, consultation,
or advice of a legal, medical, or any other professional.
- >-
Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3,
Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour,
present in all plants and algae. Commercially extracted from nettles,
grass and alfalfa. Function & characteristics:
model-index:
- name: SentenceTransformer based on microsoft/deberta-v3-small
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.22248205020578934
name: Pearson Cosine
- type: spearman_cosine
value: 0.24802235964390085
name: Spearman Cosine
- type: pearson_manhattan
value: 0.26632593273308647
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.2843623073856928
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.2323160413842197
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.24799036249272113
name: Spearman Euclidean
- type: pearson_dot
value: 0.22239084967931927
name: Pearson Dot
- type: spearman_dot
value: 0.24791612015173234
name: Spearman Dot
- type: pearson_max
value: 0.26632593273308647
name: Pearson Max
- type: spearman_max
value: 0.2843623073856928
name: Spearman Max
- task:
type: binary-classification
name: Binary Classification
dataset:
name: allNLI dev
type: allNLI-dev
metrics:
- type: cosine_accuracy
value: 0.666015625
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.983686089515686
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.5065885797950219
name: Cosine F1
- type: cosine_f1_threshold
value: 0.7642872333526611
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.3392156862745098
name: Cosine Precision
- type: cosine_recall
value: 1
name: Cosine Recall
- type: cosine_ap
value: 0.34411819659341086
name: Cosine Ap
- type: dot_accuracy
value: 0.666015625
name: Dot Accuracy
- type: dot_accuracy_threshold
value: 755.60302734375
name: Dot Accuracy Threshold
- type: dot_f1
value: 0.5065885797950219
name: Dot F1
- type: dot_f1_threshold
value: 587.0625
name: Dot F1 Threshold
- type: dot_precision
value: 0.3392156862745098
name: Dot Precision
- type: dot_recall
value: 1
name: Dot Recall
- type: dot_ap
value: 0.344109544232086
name: Dot Ap
- type: manhattan_accuracy
value: 0.6640625
name: Manhattan Accuracy
- type: manhattan_accuracy_threshold
value: 62.69102096557617
name: Manhattan Accuracy Threshold
- type: manhattan_f1
value: 0.5058479532163743
name: Manhattan F1
- type: manhattan_f1_threshold
value: 337.6861877441406
name: Manhattan F1 Threshold
- type: manhattan_precision
value: 0.3385518590998043
name: Manhattan Precision
- type: manhattan_recall
value: 1
name: Manhattan Recall
- type: manhattan_ap
value: 0.35131239981425566
name: Manhattan Ap
- type: euclidean_accuracy
value: 0.666015625
name: Euclidean Accuracy
- type: euclidean_accuracy_threshold
value: 5.00581693649292
name: Euclidean Accuracy Threshold
- type: euclidean_f1
value: 0.5065885797950219
name: Euclidean F1
- type: euclidean_f1_threshold
value: 19.022436141967773
name: Euclidean F1 Threshold
- type: euclidean_precision
value: 0.3392156862745098
name: Euclidean Precision
- type: euclidean_recall
value: 1
name: Euclidean Recall
- type: euclidean_ap
value: 0.3441246898925644
name: Euclidean Ap
- type: max_accuracy
value: 0.666015625
name: Max Accuracy
- type: max_accuracy_threshold
value: 755.60302734375
name: Max Accuracy Threshold
- type: max_f1
value: 0.5065885797950219
name: Max F1
- type: max_f1_threshold
value: 587.0625
name: Max F1 Threshold
- type: max_precision
value: 0.3392156862745098
name: Max Precision
- type: max_recall
value: 1
name: Max Recall
- type: max_ap
value: 0.35131239981425566
name: Max Ap
- task:
type: binary-classification
name: Binary Classification
dataset:
name: Qnli dev
type: Qnli-dev
metrics:
- type: cosine_accuracy
value: 0.591796875
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.9258557558059692
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.6291834002677376
name: Cosine F1
- type: cosine_f1_threshold
value: 0.750666618347168
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.4598825831702544
name: Cosine Precision
- type: cosine_recall
value: 0.9957627118644068
name: Cosine Recall
- type: cosine_ap
value: 0.5585355274462735
name: Cosine Ap
- type: dot_accuracy
value: 0.591796875
name: Dot Accuracy
- type: dot_accuracy_threshold
value: 711.18359375
name: Dot Accuracy Threshold
- type: dot_f1
value: 0.6291834002677376
name: Dot F1
- type: dot_f1_threshold
value: 576.5970458984375
name: Dot F1 Threshold
- type: dot_precision
value: 0.4598825831702544
name: Dot Precision
- type: dot_recall
value: 0.9957627118644068
name: Dot Recall
- type: dot_ap
value: 0.5585297234749824
name: Dot Ap
- type: manhattan_accuracy
value: 0.619140625
name: Manhattan Accuracy
- type: manhattan_accuracy_threshold
value: 188.09068298339844
name: Manhattan Accuracy Threshold
- type: manhattan_f1
value: 0.6301775147928994
name: Manhattan F1
- type: manhattan_f1_threshold
value: 237.80462646484375
name: Manhattan F1 Threshold
- type: manhattan_precision
value: 0.48409090909090907
name: Manhattan Precision
- type: manhattan_recall
value: 0.902542372881356
name: Manhattan Recall
- type: manhattan_ap
value: 0.5898283705050701
name: Manhattan Ap
- type: euclidean_accuracy
value: 0.591796875
name: Euclidean Accuracy
- type: euclidean_accuracy_threshold
value: 10.672666549682617
name: Euclidean Accuracy Threshold
- type: euclidean_f1
value: 0.6291834002677376
name: Euclidean F1
- type: euclidean_f1_threshold
value: 19.553747177124023
name: Euclidean F1 Threshold
- type: euclidean_precision
value: 0.4598825831702544
name: Euclidean Precision
- type: euclidean_recall
value: 0.9957627118644068
name: Euclidean Recall
- type: euclidean_ap
value: 0.5585355274462735
name: Euclidean Ap
- type: max_accuracy
value: 0.619140625
name: Max Accuracy
- type: max_accuracy_threshold
value: 711.18359375
name: Max Accuracy Threshold
- type: max_f1
value: 0.6301775147928994
name: Max F1
- type: max_f1_threshold
value: 576.5970458984375
name: Max F1 Threshold
- type: max_precision
value: 0.48409090909090907
name: Max Precision
- type: max_recall
value: 0.9957627118644068
name: Max Recall
- type: max_ap
value: 0.5898283705050701
name: Max Ap
SentenceTransformer based on microsoft/deberta-v3-small
This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/deberta-v3-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
(1): AdvancedWeightedPooling(
(linear_cls): Linear(in_features=768, out_features=768, bias=True)
(mha): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
)
(layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
# Run inference
sentences = [
'What is the basic monetary unit of Iceland?',
"Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-test
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.2225 |
spearman_cosine | 0.248 |
pearson_manhattan | 0.2663 |
spearman_manhattan | 0.2844 |
pearson_euclidean | 0.2323 |
spearman_euclidean | 0.248 |
pearson_dot | 0.2224 |
spearman_dot | 0.2479 |
pearson_max | 0.2663 |
spearman_max | 0.2844 |
Binary Classification
- Dataset:
allNLI-dev
- Evaluated with
BinaryClassificationEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.666 |
cosine_accuracy_threshold | 0.9837 |
cosine_f1 | 0.5066 |
cosine_f1_threshold | 0.7643 |
cosine_precision | 0.3392 |
cosine_recall | 1.0 |
cosine_ap | 0.3441 |
dot_accuracy | 0.666 |
dot_accuracy_threshold | 755.603 |
dot_f1 | 0.5066 |
dot_f1_threshold | 587.0625 |
dot_precision | 0.3392 |
dot_recall | 1.0 |
dot_ap | 0.3441 |
manhattan_accuracy | 0.6641 |
manhattan_accuracy_threshold | 62.691 |
manhattan_f1 | 0.5058 |
manhattan_f1_threshold | 337.6862 |
manhattan_precision | 0.3386 |
manhattan_recall | 1.0 |
manhattan_ap | 0.3513 |
euclidean_accuracy | 0.666 |
euclidean_accuracy_threshold | 5.0058 |
euclidean_f1 | 0.5066 |
euclidean_f1_threshold | 19.0224 |
euclidean_precision | 0.3392 |
euclidean_recall | 1.0 |
euclidean_ap | 0.3441 |
max_accuracy | 0.666 |
max_accuracy_threshold | 755.603 |
max_f1 | 0.5066 |
max_f1_threshold | 587.0625 |
max_precision | 0.3392 |
max_recall | 1.0 |
max_ap | 0.3513 |
Binary Classification
- Dataset:
Qnli-dev
- Evaluated with
BinaryClassificationEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.5918 |
cosine_accuracy_threshold | 0.9259 |
cosine_f1 | 0.6292 |
cosine_f1_threshold | 0.7507 |
cosine_precision | 0.4599 |
cosine_recall | 0.9958 |
cosine_ap | 0.5585 |
dot_accuracy | 0.5918 |
dot_accuracy_threshold | 711.1836 |
dot_f1 | 0.6292 |
dot_f1_threshold | 576.597 |
dot_precision | 0.4599 |
dot_recall | 0.9958 |
dot_ap | 0.5585 |
manhattan_accuracy | 0.6191 |
manhattan_accuracy_threshold | 188.0907 |
manhattan_f1 | 0.6302 |
manhattan_f1_threshold | 237.8046 |
manhattan_precision | 0.4841 |
manhattan_recall | 0.9025 |
manhattan_ap | 0.5898 |
euclidean_accuracy | 0.5918 |
euclidean_accuracy_threshold | 10.6727 |
euclidean_f1 | 0.6292 |
euclidean_f1_threshold | 19.5537 |
euclidean_precision | 0.4599 |
euclidean_recall | 0.9958 |
euclidean_ap | 0.5585 |
max_accuracy | 0.6191 |
max_accuracy_threshold | 711.1836 |
max_f1 | 0.6302 |
max_f1_threshold | 576.597 |
max_precision | 0.4841 |
max_recall | 0.9958 |
max_ap | 0.5898 |
Training Details
Evaluation Dataset
vitaminc-pairs
- Dataset: vitaminc-pairs at be6febb
- Size: 128 evaluation samples
- Columns:
claim
andevidence
- Approximate statistics based on the first 128 samples:
claim evidence type string string details - min: 9 tokens
- mean: 21.42 tokens
- max: 41 tokens
- min: 11 tokens
- mean: 35.55 tokens
- max: 79 tokens
- Samples:
claim evidence Dragon Con had over 5000 guests .
Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
COVID-19 has reached more than 185 countries .
As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
In March , Italy had 3.6x times more cases of coronavirus than China .
As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
- Loss:
CachedGISTEmbedLoss
with these parameters:{'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.025}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 42per_device_eval_batch_size
: 128gradient_accumulation_steps
: 2learning_rate
: 3e-05weight_decay
: 0.001lr_scheduler_type
: cosine_with_min_lrlr_scheduler_kwargs
: {'num_cycles': 0.5, 'min_lr': 1e-05}warmup_ratio
: 0.25save_safetensors
: Falsefp16
: Truepush_to_hub
: Truehub_model_id
: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmphub_strategy
: all_checkpointsbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 42per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 2eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 3e-05weight_decay
: 0.001adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: cosine_with_min_lrlr_scheduler_kwargs
: {'num_cycles': 0.5, 'min_lr': 1e-05}warmup_ratio
: 0.25warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Falsesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Trueresume_from_checkpoint
: Nonehub_model_id
: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmphub_strategy
: all_checkpointshub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | vitaminc-pairs loss | negation-triplets loss | scitail-pairs-pos loss | scitail-pairs-qa loss | xsum-pairs loss | sciq pairs loss | qasc pairs loss | openbookqa pairs loss | msmarco pairs loss | nq pairs loss | trivia pairs loss | gooaq pairs loss | paws-pos loss | global dataset loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.0009 | 1 | 5.8564 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0018 | 2 | 7.1716 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0027 | 3 | 5.9095 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0035 | 4 | 5.0841 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0044 | 5 | 4.0184 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0053 | 6 | 6.2191 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0062 | 7 | 5.6124 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0071 | 8 | 3.9544 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0080 | 9 | 4.7149 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0088 | 10 | 4.9616 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0097 | 11 | 5.2794 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0106 | 12 | 8.8704 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0115 | 13 | 6.0707 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0124 | 14 | 5.4071 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0133 | 15 | 6.9104 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0142 | 16 | 6.0276 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0150 | 17 | 6.737 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0159 | 18 | 6.5354 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0168 | 19 | 5.206 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0177 | 20 | 5.2469 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0186 | 21 | 5.3771 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0195 | 22 | 4.979 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0204 | 23 | 4.7909 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0212 | 24 | 4.9086 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0221 | 25 | 4.8826 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0230 | 26 | 8.2266 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0239 | 27 | 8.3024 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0248 | 28 | 5.8745 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0257 | 29 | 4.7298 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0265 | 30 | 5.4614 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0274 | 31 | 5.8594 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0283 | 32 | 5.2401 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0292 | 33 | 5.1579 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0301 | 34 | 5.2181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0310 | 35 | 4.6328 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0319 | 36 | 2.121 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0327 | 37 | 5.9026 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0336 | 38 | 7.3796 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0345 | 39 | 5.5361 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0354 | 40 | 4.0243 | 2.9018 | 5.6903 | 2.1136 | 2.8052 | 6.5831 | 0.8882 | 4.1148 | 5.0966 | 10.3911 | 10.9032 | 7.1904 | 8.1935 | 1.3943 | 5.6716 | 0.1879 | 0.3385 | 0.5781 |
0.0363 | 41 | 4.9072 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0372 | 42 | 3.4439 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0381 | 43 | 4.9787 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0389 | 44 | 5.8318 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0398 | 45 | 5.3226 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0407 | 46 | 5.1181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0416 | 47 | 4.7834 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0425 | 48 | 6.6303 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0434 | 49 | 5.8171 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0442 | 50 | 5.1962 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0451 | 51 | 5.2096 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0460 | 52 | 5.0943 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0469 | 53 | 4.9038 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0478 | 54 | 4.6479 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0487 | 55 | 5.5098 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0496 | 56 | 4.6979 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0504 | 57 | 3.1969 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0513 | 58 | 4.4127 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0522 | 59 | 3.7746 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0531 | 60 | 4.5378 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0540 | 61 | 5.0209 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0549 | 62 | 6.5936 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0558 | 63 | 4.2315 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0566 | 64 | 6.4269 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0575 | 65 | 4.2644 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0584 | 66 | 5.1388 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0593 | 67 | 5.1852 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0602 | 68 | 4.8057 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0611 | 69 | 3.1725 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0619 | 70 | 3.3322 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0628 | 71 | 5.139 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0637 | 72 | 4.307 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0646 | 73 | 5.0133 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0655 | 74 | 4.0507 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0664 | 75 | 3.3895 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0673 | 76 | 5.6736 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0681 | 77 | 4.2572 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0690 | 78 | 3.0796 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0699 | 79 | 5.0199 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0708 | 80 | 4.1414 | 2.7794 | 4.8890 | 1.8997 | 2.6761 | 6.2096 | 0.7622 | 3.3129 | 4.5498 | 7.2056 | 7.6809 | 6.3792 | 6.6567 | 1.3848 | 5.0030 | 0.2480 | 0.3513 | 0.5898 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.2.0
- Transformers: 4.45.1
- PyTorch: 2.4.0
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}