--- base_model: microsoft/deberta-v3-small datasets: - tals/vitaminc language: - en library_name: sentence-transformers metrics: - pearson_cosine - spearman_cosine - pearson_manhattan - spearman_manhattan - pearson_euclidean - spearman_euclidean - pearson_dot - spearman_dot - pearson_max - spearman_max - cosine_accuracy - cosine_accuracy_threshold - cosine_f1 - cosine_f1_threshold - cosine_precision - cosine_recall - cosine_ap - dot_accuracy - dot_accuracy_threshold - dot_f1 - dot_f1_threshold - dot_precision - dot_recall - dot_ap - manhattan_accuracy - manhattan_accuracy_threshold - manhattan_f1 - manhattan_f1_threshold - manhattan_precision - manhattan_recall - manhattan_ap - euclidean_accuracy - euclidean_accuracy_threshold - euclidean_f1 - euclidean_f1_threshold - euclidean_precision - euclidean_recall - euclidean_ap - max_accuracy - max_accuracy_threshold - max_f1 - max_f1_threshold - max_precision - max_recall - max_ap pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:225247 - loss:CachedGISTEmbedLoss widget: - source_sentence: what is exfo toolbox sentences: - Eye dilation from eye drops used for examination of the eye usually lasts from 4 to 24 hours, depending upon the strength of the drop and upon the individual patient. - Garden Grove is a city in northern Orange County in the U.S. state of California, 34 miles (55 km) south of Los Angeles. The population was 170,883 at the 2010 United States Census. State Route 22, also known as the Garden Grove Freeway, passes through the city in an east-west direction. - EXFO ToolBox Office is a product that offers you a collection of viewers and analyzers. It enables you to manage and analyze results acquired from fiber optic test modules and instruments. - source_sentence: More than 273 people have died from the 2019-20 coronavirus outside mainland China . sentences: - 'More than 3,700 people have died : around 3,100 in mainland China and around 550 in all other countries combined .' - 'More than 3,200 people have died : almost 3,000 in mainland China and around 275 in other countries .' - more than 4,900 deaths have been attributed to COVID-19 . - source_sentence: Ultrasound, a diagnostic technology, uses high-frequency vibrations transmitted into any tissue in contact with the transducer. sentences: - What diagnostic technology uses high-frequency vibrations transmitted into any tissue in contact with the transducer? - The abnormal cells cannot carry oxygen properly and can get stuck where? - What type of organism is a bacteria? - source_sentence: When you add moles of gas to a baloon by blowing it up, the volume increases. sentences: - What shape is the lens of the eye? - What happens to the volume of a balloon when you add moles of gas to it by blowing up? - Most turtle bodies are covered by a special bony or cartilaginous shell developed from their what? - source_sentence: What was the name of eleven rulers of the 19th and 20th Egyptian dynasties? sentences: - 'Airlines Yugoslavia 1968 - 1968 Renamed ^ Comments : Aviogenex was formed on 21May1968 as Genex Airlines. Restarted under current name on 30Apr1969 & liquidated in Feb2015 ^ Genealogy : Genex Airlines >Aviogenex 1968 - 1986 Renamed ^ Comments : Adria Airways was formed on 14Mar1961 & operations started on 30Jun1961 as Adria Airways, renamed to Inex in 1968 and back to Adria again in 1986. National airline of Slovenia ^ Genealogy : Adria Airways >Inex Adria Airways >Adria Airways JAT (Jugoslovenski Aerotransport) 1947 - 2003 Renamed ^ Comments : Air Serbia was founded as Aeroput on 17Jun1927, renamed to JAT on 01Apr1947. Started ops on 15Apr1947, Renamed again on 08Aug2003 to JAT Airways & reformed as Air Serbia on 26Oct2013 ^ Genealogy : Aeroput >JAT (Jugoslovenski Aerotransport) >JAT Airways >Air Serbia Jugoslovenski Aerotransport' - List of Rulers of Ancient Egypt and Nubia | Lists of Rulers | Heilbrunn Timeline of Art History | The Metropolitan Museum of Art The Metropolitan Museum of Art List of Rulers of Ancient Egypt and Nubia See works of art 30.8.234 52.127.4 Our knowledge of the succession of Egyptian kings is based on kinglists kept by the ancient Egyptians themselves. The most famous are the Palermo Stone, which covers the period from the earliest dynasties to the middle of Dynasty 5; the Abydos Kinglist, which Seti I had carved on his temple at Abydos; and the Turin Canon, a papyrus that covers the period from the earliest dynasties to the reign of Ramesses II. All are incomplete or fragmentary. We also rely on the History of Egypt written by Manetho in the third century B.C. A priest in the temple at Heliopolis, Manetho had access to many original sources and it was he who divided the kings into the thirty dynasties we use today. It is to this structure of dynasties and listed kings that we now attempt to link an absolute chronology of dates in terms of our own calendrical system. The process is made difficult by the fragmentary condition of the kinglists and by differences in the calendrical years used at various times. Some astronomical observations from the ancient Egyptians have survived, allowing us to calculate absolute dates within a margin of error. Synchronisms with the other civilizations of the ancient world are also of limited use. - 'What is the "Jack Sprat" nursery rhyme? | Reference.com What is the "Jack Sprat" nursery rhyme? A: Quick Answer "Jack Sprat" is a traditional English nursery rhyme whose main verse says, "Jack Sprat could eat no fat. His wife could eat no lean. And so between them both, you see, they licked the platter clean." Though it was likely sung by children long before, "Jack Sprat" was first published around 1765 in the compilation "Mother Goose''s Melody." Full Answer According to Rhymes.org, a U.K. website devoted to nursery rhyme lyrics and origins, the "Jack Sprat" nursery rhyme has its origins in British history. In one interpretation, Jack Sprat was King Charles I, who ruled England in the early part of the 17th century, and his wife was Queen Henrietta Maria. Parliament refused to finance the king''s war with Spain, which made him lean. However, the queen fattened the coffers by levying an illegal war tax. In an alternative version, the "Jack Sprat" nursery rhyme is linked to King Richard and his brother John of the Robin Hood legend. Jack Sprat was King John, the usurper who tried to take over the crown when King Richard went off to fight in the Crusades in the 12th century. When King Richard was captured, John had to raise a ransom to rescue him, leaving the country lean. The wife was Joan, daughter of the Earl of Gloucester, the greedy wife of King John. However, after King Richard died and John became king, he had his marriage with Joan annulled.' model-index: - name: SentenceTransformer based on microsoft/deberta-v3-small results: - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test type: sts-test metrics: - type: pearson_cosine value: 0.7673854808079448 name: Pearson Cosine - type: spearman_cosine value: 0.7776198286738142 name: Spearman Cosine - type: pearson_manhattan value: 0.782368447545155 name: Pearson Manhattan - type: spearman_manhattan value: 0.7720687033298573 name: Spearman Manhattan - type: pearson_euclidean value: 0.7882638792170585 name: Pearson Euclidean - type: spearman_euclidean value: 0.7775073687564514 name: Spearman Euclidean - type: pearson_dot value: 0.7669147371310585 name: Pearson Dot - type: spearman_dot value: 0.7762894632049069 name: Spearman Dot - type: pearson_max value: 0.7882638792170585 name: Pearson Max - type: spearman_max value: 0.7776198286738142 name: Spearman Max - task: type: binary-classification name: Binary Classification dataset: name: allNLI dev type: allNLI-dev metrics: - type: cosine_accuracy value: 0.708984375 name: Cosine Accuracy - type: cosine_accuracy_threshold value: 0.8714957237243652 name: Cosine Accuracy Threshold - type: cosine_f1 value: 0.5913043478260869 name: Cosine F1 - type: cosine_f1_threshold value: 0.7768557071685791 name: Cosine F1 Threshold - type: cosine_precision value: 0.4738675958188153 name: Cosine Precision - type: cosine_recall value: 0.7861271676300579 name: Cosine Recall - type: cosine_ap value: 0.5644305887001508 name: Cosine Ap - type: dot_accuracy value: 0.7109375 name: Dot Accuracy - type: dot_accuracy_threshold value: 674.426025390625 name: Dot Accuracy Threshold - type: dot_f1 value: 0.5913043478260869 name: Dot F1 - type: dot_f1_threshold value: 603.435302734375 name: Dot F1 Threshold - type: dot_precision value: 0.4738675958188153 name: Dot Precision - type: dot_recall value: 0.7861271676300579 name: Dot Recall - type: dot_ap value: 0.5664868031504724 name: Dot Ap - type: manhattan_accuracy value: 0.7109375 name: Manhattan Accuracy - type: manhattan_accuracy_threshold value: 294.4728088378906 name: Manhattan Accuracy Threshold - type: manhattan_f1 value: 0.5935483870967742 name: Manhattan F1 - type: manhattan_f1_threshold value: 401.1482849121094 name: Manhattan F1 Threshold - type: manhattan_precision value: 0.4726027397260274 name: Manhattan Precision - type: manhattan_recall value: 0.7976878612716763 name: Manhattan Recall - type: manhattan_ap value: 0.5642688421649988 name: Manhattan Ap - type: euclidean_accuracy value: 0.7109375 name: Euclidean Accuracy - type: euclidean_accuracy_threshold value: 14.565500259399414 name: Euclidean Accuracy Threshold - type: euclidean_f1 value: 0.5913043478260869 name: Euclidean F1 - type: euclidean_f1_threshold value: 18.60409164428711 name: Euclidean F1 Threshold - type: euclidean_precision value: 0.4738675958188153 name: Euclidean Precision - type: euclidean_recall value: 0.7861271676300579 name: Euclidean Recall - type: euclidean_ap value: 0.5645557227019772 name: Euclidean Ap - type: max_accuracy value: 0.7109375 name: Max Accuracy - type: max_accuracy_threshold value: 674.426025390625 name: Max Accuracy Threshold - type: max_f1 value: 0.5935483870967742 name: Max F1 - type: max_f1_threshold value: 603.435302734375 name: Max F1 Threshold - type: max_precision value: 0.4738675958188153 name: Max Precision - type: max_recall value: 0.7976878612716763 name: Max Recall - type: max_ap value: 0.5664868031504724 name: Max Ap - task: type: binary-classification name: Binary Classification dataset: name: Qnli dev type: Qnli-dev metrics: - type: cosine_accuracy value: 0.6796875 name: Cosine Accuracy - type: cosine_accuracy_threshold value: 0.7726649045944214 name: Cosine Accuracy Threshold - type: cosine_f1 value: 0.6925675675675677 name: Cosine F1 - type: cosine_f1_threshold value: 0.7317887544631958 name: Cosine F1 Threshold - type: cosine_precision value: 0.5758426966292135 name: Cosine Precision - type: cosine_recall value: 0.8686440677966102 name: Cosine Recall - type: cosine_ap value: 0.7302564198016936 name: Cosine Ap - type: dot_accuracy value: 0.67578125 name: Dot Accuracy - type: dot_accuracy_threshold value: 598.0419921875 name: Dot Accuracy Threshold - type: dot_f1 value: 0.6912751677852348 name: Dot F1 - type: dot_f1_threshold value: 565.4718017578125 name: Dot F1 Threshold - type: dot_precision value: 0.5722222222222222 name: Dot Precision - type: dot_recall value: 0.8728813559322034 name: Dot Recall - type: dot_ap value: 0.7300462025003271 name: Dot Ap - type: manhattan_accuracy value: 0.6796875 name: Manhattan Accuracy - type: manhattan_accuracy_threshold value: 404.8309020996094 name: Manhattan Accuracy Threshold - type: manhattan_f1 value: 0.6933333333333332 name: Manhattan F1 - type: manhattan_f1_threshold value: 444.99224853515625 name: Manhattan F1 Threshold - type: manhattan_precision value: 0.5714285714285714 name: Manhattan Precision - type: manhattan_recall value: 0.8813559322033898 name: Manhattan Recall - type: manhattan_ap value: 0.7369214156436785 name: Manhattan Ap - type: euclidean_accuracy value: 0.6796875 name: Euclidean Accuracy - type: euclidean_accuracy_threshold value: 18.790739059448242 name: Euclidean Accuracy Threshold - type: euclidean_f1 value: 0.6934306569343065 name: Euclidean F1 - type: euclidean_f1_threshold value: 19.35132598876953 name: Euclidean F1 Threshold - type: euclidean_precision value: 0.6089743589743589 name: Euclidean Precision - type: euclidean_recall value: 0.8050847457627118 name: Euclidean Recall - type: euclidean_ap value: 0.7307381840067684 name: Euclidean Ap - type: max_accuracy value: 0.6796875 name: Max Accuracy - type: max_accuracy_threshold value: 598.0419921875 name: Max Accuracy Threshold - type: max_f1 value: 0.6934306569343065 name: Max F1 - type: max_f1_threshold value: 565.4718017578125 name: Max F1 Threshold - type: max_precision value: 0.6089743589743589 name: Max Precision - type: max_recall value: 0.8813559322033898 name: Max Recall - type: max_ap value: 0.7369214156436785 name: Max Ap --- # SentenceTransformer based on microsoft/deberta-v3-small This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Language:** en ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model (1): AdvancedWeightedPooling( (linear_cls): Linear(in_features=768, out_features=768, bias=True) (linear_mean): Linear(in_features=768, out_features=768, bias=True) (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (layernorm_cls): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (layernorm_mean): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-v3-step1") # Run inference sentences = [ 'What was the name of eleven rulers of the 19th and 20th Egyptian dynasties?', 'List of Rulers of Ancient Egypt and Nubia | Lists of Rulers | Heilbrunn Timeline of Art History | The Metropolitan Museum of Art The Metropolitan Museum of Art List of Rulers of Ancient Egypt and Nubia See works of art 30.8.234 52.127.4 Our knowledge of the succession of Egyptian kings is based on kinglists kept by the ancient Egyptians themselves. The most famous are the Palermo Stone, which covers the period from the earliest dynasties to the middle of Dynasty 5; the Abydos Kinglist, which Seti I had carved on his temple at Abydos; and the Turin Canon, a papyrus that covers the period from the earliest dynasties to the reign of Ramesses II. All are incomplete or fragmentary. We also rely on the History of Egypt written by Manetho in the third century B.C. A priest in the temple at Heliopolis, Manetho had access to many original sources and it was he who divided the kings into the thirty dynasties we use today. It is to this structure of dynasties and listed kings that we now attempt to link an absolute chronology of dates in terms of our own calendrical system. The process is made difficult by the fragmentary condition of the kinglists and by differences in the calendrical years used at various times. Some astronomical observations from the ancient Egyptians have survived, allowing us to calculate absolute dates within a margin of error. Synchronisms with the other civilizations of the ancient world are also of limited use.', 'What is the "Jack Sprat" nursery rhyme? | Reference.com What is the "Jack Sprat" nursery rhyme? A: Quick Answer "Jack Sprat" is a traditional English nursery rhyme whose main verse says, "Jack Sprat could eat no fat. His wife could eat no lean. And so between them both, you see, they licked the platter clean." Though it was likely sung by children long before, "Jack Sprat" was first published around 1765 in the compilation "Mother Goose\'s Melody." Full Answer According to Rhymes.org, a U.K. website devoted to nursery rhyme lyrics and origins, the "Jack Sprat" nursery rhyme has its origins in British history. In one interpretation, Jack Sprat was King Charles I, who ruled England in the early part of the 17th century, and his wife was Queen Henrietta Maria. Parliament refused to finance the king\'s war with Spain, which made him lean. However, the queen fattened the coffers by levying an illegal war tax. In an alternative version, the "Jack Sprat" nursery rhyme is linked to King Richard and his brother John of the Robin Hood legend. Jack Sprat was King John, the usurper who tried to take over the crown when King Richard went off to fight in the Crusades in the 12th century. When King Richard was captured, John had to raise a ransom to rescue him, leaving the country lean. The wife was Joan, daughter of the Earl of Gloucester, the greedy wife of King John. However, after King Richard died and John became king, he had his marriage with Joan annulled.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Dataset: `sts-test` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.7674 | | **spearman_cosine** | **0.7776** | | pearson_manhattan | 0.7824 | | spearman_manhattan | 0.7721 | | pearson_euclidean | 0.7883 | | spearman_euclidean | 0.7775 | | pearson_dot | 0.7669 | | spearman_dot | 0.7763 | | pearson_max | 0.7883 | | spearman_max | 0.7776 | #### Binary Classification * Dataset: `allNLI-dev` * Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator) | Metric | Value | |:-----------------------------|:-----------| | cosine_accuracy | 0.709 | | cosine_accuracy_threshold | 0.8715 | | cosine_f1 | 0.5913 | | cosine_f1_threshold | 0.7769 | | cosine_precision | 0.4739 | | cosine_recall | 0.7861 | | cosine_ap | 0.5644 | | dot_accuracy | 0.7109 | | dot_accuracy_threshold | 674.426 | | dot_f1 | 0.5913 | | dot_f1_threshold | 603.4353 | | dot_precision | 0.4739 | | dot_recall | 0.7861 | | dot_ap | 0.5665 | | manhattan_accuracy | 0.7109 | | manhattan_accuracy_threshold | 294.4728 | | manhattan_f1 | 0.5935 | | manhattan_f1_threshold | 401.1483 | | manhattan_precision | 0.4726 | | manhattan_recall | 0.7977 | | manhattan_ap | 0.5643 | | euclidean_accuracy | 0.7109 | | euclidean_accuracy_threshold | 14.5655 | | euclidean_f1 | 0.5913 | | euclidean_f1_threshold | 18.6041 | | euclidean_precision | 0.4739 | | euclidean_recall | 0.7861 | | euclidean_ap | 0.5646 | | max_accuracy | 0.7109 | | max_accuracy_threshold | 674.426 | | max_f1 | 0.5935 | | max_f1_threshold | 603.4353 | | max_precision | 0.4739 | | max_recall | 0.7977 | | **max_ap** | **0.5665** | #### Binary Classification * Dataset: `Qnli-dev` * Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator) | Metric | Value | |:-----------------------------|:-----------| | cosine_accuracy | 0.6797 | | cosine_accuracy_threshold | 0.7727 | | cosine_f1 | 0.6926 | | cosine_f1_threshold | 0.7318 | | cosine_precision | 0.5758 | | cosine_recall | 0.8686 | | cosine_ap | 0.7303 | | dot_accuracy | 0.6758 | | dot_accuracy_threshold | 598.042 | | dot_f1 | 0.6913 | | dot_f1_threshold | 565.4718 | | dot_precision | 0.5722 | | dot_recall | 0.8729 | | dot_ap | 0.73 | | manhattan_accuracy | 0.6797 | | manhattan_accuracy_threshold | 404.8309 | | manhattan_f1 | 0.6933 | | manhattan_f1_threshold | 444.9922 | | manhattan_precision | 0.5714 | | manhattan_recall | 0.8814 | | manhattan_ap | 0.7369 | | euclidean_accuracy | 0.6797 | | euclidean_accuracy_threshold | 18.7907 | | euclidean_f1 | 0.6934 | | euclidean_f1_threshold | 19.3513 | | euclidean_precision | 0.609 | | euclidean_recall | 0.8051 | | euclidean_ap | 0.7307 | | max_accuracy | 0.6797 | | max_accuracy_threshold | 598.042 | | max_f1 | 0.6934 | | max_f1_threshold | 565.4718 | | max_precision | 0.609 | | max_recall | 0.8814 | | **max_ap** | **0.7369** | ## Training Details ### Evaluation Dataset #### vitaminc-pairs * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0) * Size: 128 evaluation samples * Columns: claim and evidence * Approximate statistics based on the first 128 samples: | | claim | evidence | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | claim | evidence | |:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Dragon Con had over 5000 guests . | Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell . | | COVID-19 has reached more than 185 countries . | As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths . | | In March , Italy had 3.6x times more cases of coronavirus than China . | As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China . | * Loss: [CachedGISTEmbedLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters: ```json {'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.025} ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 100 - `per_device_eval_batch_size`: 256 - `gradient_accumulation_steps`: 2 - `lr_scheduler_type`: cosine_with_min_lr - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1.6666666666666667e-05} - `warmup_ratio`: 0.33 - `save_safetensors`: False - `fp16`: True - `push_to_hub`: True - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-v3-step1-checkpoints-tmp - `hub_strategy`: all_checkpoints - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 100 - `per_device_eval_batch_size`: 256 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 2 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 3 - `max_steps`: -1 - `lr_scheduler_type`: cosine_with_min_lr - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1.6666666666666667e-05} - `warmup_ratio`: 0.33 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: False - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: True - `resume_from_checkpoint`: None - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-v3-step1-checkpoints-tmp - `hub_strategy`: all_checkpoints - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `eval_use_gather_object`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs
Click to expand | Epoch | Step | Training Loss | vitaminc-pairs loss | negation-triplets loss | scitail-pairs-pos loss | scitail-pairs-qa loss | xsum-pairs loss | sciq pairs loss | qasc pairs loss | openbookqa pairs loss | msmarco pairs loss | nq pairs loss | trivia pairs loss | gooaq pairs loss | paws-pos loss | global dataset loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap | |:------:|:----:|:-------------:|:-------------------:|:----------------------:|:----------------------:|:---------------------:|:---------------:|:---------------:|:---------------:|:---------------------:|:------------------:|:-------------:|:-----------------:|:----------------:|:-------------:|:-------------------:|:------------------------:|:-----------------:|:---------------:| | 0.0168 | 8 | 10.2928 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.0336 | 16 | 9.2166 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.0504 | 24 | 9.4858 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.0672 | 32 | 10.6143 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.0840 | 40 | 8.7553 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1008 | 48 | 10.9939 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1176 | 56 | 7.6039 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1345 | 64 | 5.9498 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1513 | 72 | 7.3051 | 3.2988 | 3.9604 | 1.9818 | 2.1997 | 6.0515 | 0.6095 | 6.3199 | 4.8391 | 6.4886 | 6.6406 | 6.4894 | 6.1527 | 2.0082 | 4.9577 | 0.3066 | 0.3444 | 0.5627 | | 0.1681 | 80 | 8.3034 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1849 | 88 | 7.6669 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2017 | 96 | 6.6415 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2185 | 104 | 5.7797 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2353 | 112 | 5.8361 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2521 | 120 | 5.3339 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2689 | 128 | 5.5908 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2857 | 136 | 5.3209 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3025 | 144 | 5.5359 | 3.3310 | 3.8580 | 1.4769 | 1.6994 | 5.4819 | 0.5385 | 5.2021 | 4.4410 | 5.3419 | 5.5506 | 5.6972 | 5.3376 | 1.4170 | 3.9169 | 0.2954 | 0.3795 | 0.6317 | | 0.3193 | 152 | 5.4713 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3361 | 160 | 4.9368 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3529 | 168 | 4.6594 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3697 | 176 | 4.8392 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3866 | 184 | 4.414 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4034 | 192 | 4.891 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4202 | 200 | 4.4553 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4370 | 208 | 3.9729 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4538 | 216 | 3.7705 | 3.2468 | 3.6435 | 0.7890 | 0.7356 | 3.9327 | 0.4082 | 3.7175 | 3.5404 | 3.5351 | 4.0506 | 3.9953 | 3.6074 | 0.4195 | 2.4726 | 0.3791 | 0.4133 | 0.6779 | | 0.4706 | 224 | 3.8409 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4874 | 232 | 3.7894 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5042 | 240 | 3.3523 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5210 | 248 | 3.2407 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5378 | 256 | 3.3203 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5546 | 264 | 2.8457 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5714 | 272 | 2.4181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5882 | 280 | 3.4589 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6050 | 288 | 2.8203 | 3.1119 | 3.1485 | 0.4531 | 0.2652 | 2.6895 | 0.2656 | 2.5542 | 2.7523 | 2.6600 | 3.1773 | 3.2099 | 2.7316 | 0.2006 | 1.6342 | 0.5257 | 0.4717 | 0.7078 | | 0.6218 | 296 | 2.4697 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6387 | 304 | 2.4654 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6555 | 312 | 2.4236 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6723 | 320 | 2.2879 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6891 | 328 | 2.2145 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7059 | 336 | 1.8464 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7227 | 344 | 2.0086 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7395 | 352 | 2.0635 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7563 | 360 | 1.8584 | 3.3202 | 2.5793 | 0.3434 | 0.1618 | 1.6759 | 0.1834 | 1.6454 | 2.1257 | 2.1938 | 2.5316 | 2.4558 | 2.0596 | 0.0984 | 1.2206 | 0.6610 | 0.5199 | 0.7119 | | 0.7731 | 368 | 2.0286 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7899 | 376 | 1.9389 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8067 | 384 | 1.7453 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8235 | 392 | 1.6629 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8403 | 400 | 1.2724 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8571 | 408 | 1.7824 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8739 | 416 | 1.5826 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8908 | 424 | 1.1971 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9076 | 432 | 1.5228 | 3.3624 | 2.1952 | 0.3006 | 0.1223 | 1.1091 | 0.1582 | 1.2383 | 1.8664 | 1.7434 | 2.3959 | 2.0697 | 1.7563 | 0.0766 | 1.0193 | 0.7292 | 0.5194 | 0.7126 | | 0.9244 | 440 | 1.3323 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9412 | 448 | 1.5124 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9580 | 456 | 1.5565 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9748 | 464 | 1.3672 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9916 | 472 | 1.0382 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.0084 | 480 | 1.0626 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.0252 | 488 | 1.3539 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.0420 | 496 | 1.1723 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.0588 | 504 | 1.4235 | 3.4031 | 1.9759 | 0.2554 | 0.0814 | 0.9034 | 0.1378 | 1.1603 | 1.7589 | 1.5608 | 2.1230 | 1.7719 | 1.6633 | 0.0720 | 0.9380 | 0.7523 | 0.5297 | 0.7129 | | 1.0756 | 512 | 1.2283 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.0924 | 520 | 1.2455 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.1092 | 528 | 1.4265 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.1261 | 536 | 1.296 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.1429 | 544 | 0.8763 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.1597 | 552 | 1.5678 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.1765 | 560 | 1.2548 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.1933 | 568 | 1.3731 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.2101 | 576 | 1.3023 | 3.3815 | 1.8740 | 0.2373 | 0.0769 | 0.7711 | 0.1237 | 0.9432 | 1.6871 | 1.5070 | 1.9947 | 1.6041 | 1.5579 | 0.0721 | 0.8661 | 0.7642 | 0.5412 | 0.7159 | | 1.2269 | 584 | 0.8135 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.2437 | 592 | 1.0259 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.2605 | 600 | 1.1896 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.2773 | 608 | 1.0532 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.2941 | 616 | 1.3221 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.3109 | 624 | 1.3136 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.3277 | 632 | 1.2238 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.3445 | 640 | 1.2407 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.3613 | 648 | 1.2245 | 3.4717 | 1.7962 | 0.2242 | 0.0488 | 0.7472 | 0.1108 | 0.9272 | 1.6692 | 1.3845 | 1.9117 | 1.3410 | 1.4387 | 0.0701 | 0.8505 | 0.7680 | 0.5471 | 0.7227 | | 1.3782 | 656 | 1.0428 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.3950 | 664 | 1.1391 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.4118 | 672 | 1.2632 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.4286 | 680 | 0.9403 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.4454 | 688 | 0.7571 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.4622 | 696 | 0.9436 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.4790 | 704 | 1.1239 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.4958 | 712 | 0.9499 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.5126 | 720 | 1.0945 | 3.6495 | 1.6693 | 0.2157 | 0.0492 | 0.6830 | 0.1049 | 0.9140 | 1.5967 | 1.4397 | 1.7394 | 1.3303 | 1.4334 | 0.0603 | 0.8185 | 0.7815 | 0.5606 | 0.7098 | | 1.5294 | 728 | 1.1161 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.5462 | 736 | 1.0056 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.5630 | 744 | 1.1743 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.5798 | 752 | 0.9153 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.5966 | 760 | 1.1589 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.6134 | 768 | 0.9187 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.6303 | 776 | 0.6937 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.6471 | 784 | 0.9704 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.6639 | 792 | 0.7343 | 3.5442 | 1.6493 | 0.2208 | 0.0249 | 0.6152 | 0.0969 | 0.7111 | 1.5369 | 1.4058 | 1.7066 | 1.2784 | 1.3419 | 0.0585 | 0.7827 | 0.7749 | 0.5627 | 0.7284 | | 1.6807 | 800 | 1.2878 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.6975 | 808 | 0.9898 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.7143 | 816 | 0.7613 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.7311 | 824 | 0.9612 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.7479 | 832 | 1.1524 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.7647 | 840 | 0.827 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.7815 | 848 | 1.1898 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.7983 | 856 | 1.0117 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.8151 | 864 | 0.7019 | 3.4544 | 1.6149 | 0.2035 | 0.0181 | 0.5525 | 0.0999 | 0.6641 | 1.5456 | 1.3911 | 1.7188 | 1.2547 | 1.3517 | 0.0562 | 0.7473 | 0.7684 | 0.5697 | 0.7329 | | 1.8319 | 872 | 0.8352 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.8487 | 880 | 0.7836 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.8655 | 888 | 1.0187 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.8824 | 896 | 0.74 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.8992 | 904 | 0.7263 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.9160 | 912 | 0.8073 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.9328 | 920 | 0.8185 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.9496 | 928 | 1.0992 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 1.9664 | 936 | 0.9973 | 3.5110 | 1.5776 | 0.2035 | 0.0250 | 0.5881 | 0.0934 | 0.6719 | 1.5059 | 1.2970 | 1.6186 | 1.1815 | 1.2714 | 0.0564 | 0.7213 | 0.7799 | 0.5544 | 0.7341 | | 1.9832 | 944 | 0.6662 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.0 | 952 | 0.533 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.0168 | 960 | 0.7712 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.0336 | 968 | 0.6879 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.0504 | 976 | 0.7975 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.0672 | 984 | 0.873 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.0840 | 992 | 0.7995 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.1008 | 1000 | 1.0119 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.1176 | 1008 | 0.6317 | 3.6778 | 1.5845 | 0.2102 | 0.0228 | 0.5851 | 0.0977 | 0.6411 | 1.4752 | 1.2992 | 1.6314 | 1.1260 | 1.2683 | 0.0556 | 0.7329 | 0.7693 | 0.5614 | 0.7274 | | 2.1345 | 1016 | 0.72 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.1513 | 1024 | 0.9418 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.1681 | 1032 | 0.7848 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.1849 | 1040 | 0.6965 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.2017 | 1048 | 1.0447 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.2185 | 1056 | 0.6361 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.2353 | 1064 | 0.6837 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.2521 | 1072 | 0.5713 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.2689 | 1080 | 0.8193 | 3.6399 | 1.5565 | 0.2069 | 0.0213 | 0.5440 | 0.0904 | 0.6057 | 1.4815 | 1.2856 | 1.6441 | 1.1469 | 1.2540 | 0.0543 | 0.7216 | 0.7765 | 0.5599 | 0.7322 | | 2.2857 | 1088 | 0.9754 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.3025 | 1096 | 0.8932 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.3193 | 1104 | 0.8716 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.3361 | 1112 | 0.8787 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.3529 | 1120 | 0.9529 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.3697 | 1128 | 0.775 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.3866 | 1136 | 0.6178 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.4034 | 1144 | 0.8384 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.4202 | 1152 | 0.9425 | 3.5672 | 1.5244 | 0.2111 | 0.0162 | 0.5593 | 0.0893 | 0.5759 | 1.4933 | 1.2703 | 1.5815 | 1.1202 | 1.2132 | 0.0531 | 0.7058 | 0.7730 | 0.5635 | 0.7350 | | 2.4370 | 1160 | 0.4551 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.4538 | 1168 | 0.6392 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.4706 | 1176 | 0.8341 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.4874 | 1184 | 0.7392 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.5042 | 1192 | 0.7646 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.5210 | 1200 | 0.8613 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.5378 | 1208 | 0.7585 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.5546 | 1216 | 1.0611 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.5714 | 1224 | 0.6506 | 3.6439 | 1.5040 | 0.2125 | 0.0162 | 0.5282 | 0.0863 | 0.5858 | 1.5073 | 1.2444 | 1.5493 | 1.1014 | 1.2073 | 0.0532 | 0.7022 | 0.7774 | 0.5647 | 0.7328 | | 2.5882 | 1232 | 0.8525 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.6050 | 1240 | 0.6304 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.6218 | 1248 | 0.6354 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.6387 | 1256 | 0.6583 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.6555 | 1264 | 0.5964 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.6723 | 1272 | 0.818 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.6891 | 1280 | 0.8635 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.7059 | 1288 | 0.6389 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.7227 | 1296 | 0.6819 | 3.6131 | 1.5104 | 0.2084 | 0.0148 | 0.5229 | 0.0854 | 0.5588 | 1.4963 | 1.2766 | 1.5679 | 1.0982 | 1.2203 | 0.0529 | 0.7059 | 0.7762 | 0.5659 | 0.7355 | | 2.7395 | 1304 | 0.7878 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.7563 | 1312 | 0.7638 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.7731 | 1320 | 0.8885 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.7899 | 1328 | 0.8184 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.8067 | 1336 | 0.7472 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.8235 | 1344 | 0.7012 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.8403 | 1352 | 0.4622 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.8571 | 1360 | 0.846 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.8739 | 1368 | 0.8308 | 3.6224 | 1.5088 | 0.2084 | 0.0148 | 0.5118 | 0.0858 | 0.5523 | 1.4941 | 1.2756 | 1.5808 | 1.0925 | 1.2114 | 0.0521 | 0.7022 | 0.7765 | 0.5662 | 0.7366 | | 2.8908 | 1376 | 0.5334 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.9076 | 1384 | 0.7893 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.9244 | 1392 | 0.6897 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.9412 | 1400 | 0.7803 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.9580 | 1408 | 0.841 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.9748 | 1416 | 0.787 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 2.9916 | 1424 | 0.5861 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 3.0 | 1428 | - | 3.6139 | 1.5071 | 0.2084 | 0.0150 | 0.5124 | 0.0862 | 0.5532 | 1.4924 | 1.2700 | 1.5806 | 1.0905 | 1.2081 | 0.0519 | 0.6997 | 0.7776 | 0.5665 | 0.7369 |
### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.2.0 - Transformers: 4.44.2 - PyTorch: 2.4.1+cu121 - Accelerate: 0.34.2 - Datasets: 3.0.1 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```