diff --git "a/checkpoint-240/README.md" "b/checkpoint-240/README.md"
new file mode 100644--- /dev/null
+++ "b/checkpoint-240/README.md"
@@ -0,0 +1,1062 @@
+---
+base_model: microsoft/deberta-v3-small
+datasets:
+- tals/vitaminc
+language:
+- en
+library_name: sentence-transformers
+metrics:
+- pearson_cosine
+- spearman_cosine
+- pearson_manhattan
+- spearman_manhattan
+- pearson_euclidean
+- spearman_euclidean
+- pearson_dot
+- spearman_dot
+- pearson_max
+- spearman_max
+- cosine_accuracy
+- cosine_accuracy_threshold
+- cosine_f1
+- cosine_f1_threshold
+- cosine_precision
+- cosine_recall
+- cosine_ap
+- dot_accuracy
+- dot_accuracy_threshold
+- dot_f1
+- dot_f1_threshold
+- dot_precision
+- dot_recall
+- dot_ap
+- manhattan_accuracy
+- manhattan_accuracy_threshold
+- manhattan_f1
+- manhattan_f1_threshold
+- manhattan_precision
+- manhattan_recall
+- manhattan_ap
+- euclidean_accuracy
+- euclidean_accuracy_threshold
+- euclidean_f1
+- euclidean_f1_threshold
+- euclidean_precision
+- euclidean_recall
+- euclidean_ap
+- max_accuracy
+- max_accuracy_threshold
+- max_f1
+- max_f1_threshold
+- max_precision
+- max_recall
+- max_ap
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:225247
+- loss:CachedGISTEmbedLoss
+widget:
+- source_sentence: how long to grill boneless skinless chicken breasts in oven
+ sentences:
+ - "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name.\
+ \ Its pronunciation is AA K AA HHiy â\x80 . Akahi's origin, as well as its use,\
+ \ is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently\
+ \ used as a baby name for boys."
+ - October consists of 31 days. November has 30 days. When you add both together
+ they have 61 days.
+ - Heat a grill or grill pan. When the grill is hot, place the chicken on the grill
+ and cook for about 4 minutes per side, or until cooked through. You can also bake
+ the thawed chicken in a 375 degree F oven for 15 minutes, or until cooked through.
+- source_sentence: More than 273 people have died from the 2019-20 coronavirus outside
+ mainland China .
+ sentences:
+ - 'More than 3,700 people have died : around 3,100 in mainland China and around
+ 550 in all other countries combined .'
+ - 'More than 3,200 people have died : almost 3,000 in mainland China and around
+ 275 in other countries .'
+ - more than 4,900 deaths have been attributed to COVID-19 .
+- source_sentence: Most red algae species live in oceans.
+ sentences:
+ - Where do most red algae species live?
+ - Which layer of the earth is molten?
+ - As a diver descends, the increase in pressure causes the body’s air pockets in
+ the ears and lungs to do what?
+- source_sentence: Binary compounds of carbon with less electronegative elements are
+ called carbides.
+ sentences:
+ - What are four children born at one birth called?
+ - Binary compounds of carbon with less electronegative elements are called what?
+ - The water cycle involves movement of water between air and what?
+- source_sentence: What is the basic monetary unit of Iceland?
+ sentences:
+ - 'Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese traditional
+ dress Want to watch this again later? Sign in to add this video to a playlist.
+ Need to report the video? Sign in to report inappropriate content. Rating is available
+ when the video has been rented. This feature is not available right now. Please
+ try again later. Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant,
+ áo dài was designed to praise the slender beauty of Vietnamese women. The dress
+ is a genius combination of ancient and modern. It shows every curve on the girl''s
+ body, creating sexiness for the wearer, yet it still preserves the traditional
+ feminine grace of Vietnamese women with its charming flowing flaps. The simplicity
+ of áo dài makes it convenient and practical, something that other Asian traditional
+ clothes lack. The waist-length slits of the flaps allow every movement of the
+ legs: walking, running, riding a bicycle, climbing a tree, doing high kicks. The
+ looseness of the pants allows comfortability. As a girl walks in áo dài, the movements
+ of the flaps make it seem like she''s not walking but floating in the air. This
+ breath-taking beautiful image of a Vietnamese girl walking in áo dài has been
+ an inspiration for generations of Vietnamese poets, novelists, artists and has
+ left a deep impression for every foreigner who has visited the country. Category'
+ - 'Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary
+ Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary
+ http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic
+ monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend:
+ monetary unit - a unit of money Icelandic krona , krona - the basic unit of money
+ in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its
+ existence? Tell a friend about us , add a link to this page, or visit the webmaster''s
+ page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
+ Disclaimer All content on this website, including dictionary, thesaurus, literature,
+ geography, and other reference data is for informational purposes only. This information
+ should not be considered complete, up to date, and is not intended to be used
+ in place of a visit, consultation, or advice of a legal, medical, or any other
+ professional.'
+ - 'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll
+ A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and
+ algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:'
+model-index:
+- name: SentenceTransformer based on microsoft/deberta-v3-small
+ results:
+ - task:
+ type: semantic-similarity
+ name: Semantic Similarity
+ dataset:
+ name: sts test
+ type: sts-test
+ metrics:
+ - type: pearson_cosine
+ value: 0.3977846210139704
+ name: Pearson Cosine
+ - type: spearman_cosine
+ value: 0.44299644096637864
+ name: Spearman Cosine
+ - type: pearson_manhattan
+ value: 0.43174431600737306
+ name: Pearson Manhattan
+ - type: spearman_manhattan
+ value: 0.4553695033739603
+ name: Spearman Manhattan
+ - type: pearson_euclidean
+ value: 0.42060129087924125
+ name: Pearson Euclidean
+ - type: spearman_euclidean
+ value: 0.44300328790921845
+ name: Spearman Euclidean
+ - type: pearson_dot
+ value: 0.3974381713503513
+ name: Pearson Dot
+ - type: spearman_dot
+ value: 0.4426330607320026
+ name: Spearman Dot
+ - type: pearson_max
+ value: 0.43174431600737306
+ name: Pearson Max
+ - type: spearman_max
+ value: 0.4553695033739603
+ name: Spearman Max
+ - task:
+ type: binary-classification
+ name: Binary Classification
+ dataset:
+ name: allNLI dev
+ type: allNLI-dev
+ metrics:
+ - type: cosine_accuracy
+ value: 0.66796875
+ name: Cosine Accuracy
+ - type: cosine_accuracy_threshold
+ value: 0.9727417230606079
+ name: Cosine Accuracy Threshold
+ - type: cosine_f1
+ value: 0.5338983050847458
+ name: Cosine F1
+ - type: cosine_f1_threshold
+ value: 0.8509687781333923
+ name: Cosine F1 Threshold
+ - type: cosine_precision
+ value: 0.4214046822742475
+ name: Cosine Precision
+ - type: cosine_recall
+ value: 0.7283236994219653
+ name: Cosine Recall
+ - type: cosine_ap
+ value: 0.4443750308487611
+ name: Cosine Ap
+ - type: dot_accuracy
+ value: 0.66796875
+ name: Dot Accuracy
+ - type: dot_accuracy_threshold
+ value: 747.4664916992188
+ name: Dot Accuracy Threshold
+ - type: dot_f1
+ value: 0.5347368421052632
+ name: Dot F1
+ - type: dot_f1_threshold
+ value: 652.6121826171875
+ name: Dot F1 Threshold
+ - type: dot_precision
+ value: 0.4205298013245033
+ name: Dot Precision
+ - type: dot_recall
+ value: 0.7341040462427746
+ name: Dot Recall
+ - type: dot_ap
+ value: 0.4447331164315086
+ name: Dot Ap
+ - type: manhattan_accuracy
+ value: 0.673828125
+ name: Manhattan Accuracy
+ - type: manhattan_accuracy_threshold
+ value: 185.35494995117188
+ name: Manhattan Accuracy Threshold
+ - type: manhattan_f1
+ value: 0.5340909090909091
+ name: Manhattan F1
+ - type: manhattan_f1_threshold
+ value: 316.48419189453125
+ name: Manhattan F1 Threshold
+ - type: manhattan_precision
+ value: 0.3971830985915493
+ name: Manhattan Precision
+ - type: manhattan_recall
+ value: 0.815028901734104
+ name: Manhattan Recall
+ - type: manhattan_ap
+ value: 0.45330636568192945
+ name: Manhattan Ap
+ - type: euclidean_accuracy
+ value: 0.66796875
+ name: Euclidean Accuracy
+ - type: euclidean_accuracy_threshold
+ value: 6.472302436828613
+ name: Euclidean Accuracy Threshold
+ - type: euclidean_f1
+ value: 0.5338983050847458
+ name: Euclidean F1
+ - type: euclidean_f1_threshold
+ value: 15.134000778198242
+ name: Euclidean F1 Threshold
+ - type: euclidean_precision
+ value: 0.4214046822742475
+ name: Euclidean Precision
+ - type: euclidean_recall
+ value: 0.7283236994219653
+ name: Euclidean Recall
+ - type: euclidean_ap
+ value: 0.44436910603457025
+ name: Euclidean Ap
+ - type: max_accuracy
+ value: 0.673828125
+ name: Max Accuracy
+ - type: max_accuracy_threshold
+ value: 747.4664916992188
+ name: Max Accuracy Threshold
+ - type: max_f1
+ value: 0.5347368421052632
+ name: Max F1
+ - type: max_f1_threshold
+ value: 652.6121826171875
+ name: Max F1 Threshold
+ - type: max_precision
+ value: 0.4214046822742475
+ name: Max Precision
+ - type: max_recall
+ value: 0.815028901734104
+ name: Max Recall
+ - type: max_ap
+ value: 0.45330636568192945
+ name: Max Ap
+ - task:
+ type: binary-classification
+ name: Binary Classification
+ dataset:
+ name: Qnli dev
+ type: Qnli-dev
+ metrics:
+ - type: cosine_accuracy
+ value: 0.66015625
+ name: Cosine Accuracy
+ - type: cosine_accuracy_threshold
+ value: 0.8744948506355286
+ name: Cosine Accuracy Threshold
+ - type: cosine_f1
+ value: 0.6646433990895295
+ name: Cosine F1
+ - type: cosine_f1_threshold
+ value: 0.753309965133667
+ name: Cosine F1 Threshold
+ - type: cosine_precision
+ value: 0.5177304964539007
+ name: Cosine Precision
+ - type: cosine_recall
+ value: 0.9279661016949152
+ name: Cosine Recall
+ - type: cosine_ap
+ value: 0.6610633478265061
+ name: Cosine Ap
+ - type: dot_accuracy
+ value: 0.66015625
+ name: Dot Accuracy
+ - type: dot_accuracy_threshold
+ value: 670.719970703125
+ name: Dot Accuracy Threshold
+ - type: dot_f1
+ value: 0.6646433990895295
+ name: Dot F1
+ - type: dot_f1_threshold
+ value: 578.874755859375
+ name: Dot F1 Threshold
+ - type: dot_precision
+ value: 0.5177304964539007
+ name: Dot Precision
+ - type: dot_recall
+ value: 0.9279661016949152
+ name: Dot Recall
+ - type: dot_ap
+ value: 0.6607472505349153
+ name: Dot Ap
+ - type: manhattan_accuracy
+ value: 0.666015625
+ name: Manhattan Accuracy
+ - type: manhattan_accuracy_threshold
+ value: 281.9825134277344
+ name: Manhattan Accuracy Threshold
+ - type: manhattan_f1
+ value: 0.6678899082568808
+ name: Manhattan F1
+ - type: manhattan_f1_threshold
+ value: 328.83447265625
+ name: Manhattan F1 Threshold
+ - type: manhattan_precision
+ value: 0.5889967637540453
+ name: Manhattan Precision
+ - type: manhattan_recall
+ value: 0.7711864406779662
+ name: Manhattan Recall
+ - type: manhattan_ap
+ value: 0.6664006509577655
+ name: Manhattan Ap
+ - type: euclidean_accuracy
+ value: 0.66015625
+ name: Euclidean Accuracy
+ - type: euclidean_accuracy_threshold
+ value: 13.881525039672852
+ name: Euclidean Accuracy Threshold
+ - type: euclidean_f1
+ value: 0.6646433990895295
+ name: Euclidean F1
+ - type: euclidean_f1_threshold
+ value: 19.471359252929688
+ name: Euclidean F1 Threshold
+ - type: euclidean_precision
+ value: 0.5177304964539007
+ name: Euclidean Precision
+ - type: euclidean_recall
+ value: 0.9279661016949152
+ name: Euclidean Recall
+ - type: euclidean_ap
+ value: 0.6611053426809266
+ name: Euclidean Ap
+ - type: max_accuracy
+ value: 0.666015625
+ name: Max Accuracy
+ - type: max_accuracy_threshold
+ value: 670.719970703125
+ name: Max Accuracy Threshold
+ - type: max_f1
+ value: 0.6678899082568808
+ name: Max F1
+ - type: max_f1_threshold
+ value: 578.874755859375
+ name: Max F1 Threshold
+ - type: max_precision
+ value: 0.5889967637540453
+ name: Max Precision
+ - type: max_recall
+ value: 0.9279661016949152
+ name: Max Recall
+ - type: max_ap
+ value: 0.6664006509577655
+ name: Max Ap
+---
+
+# SentenceTransformer based on microsoft/deberta-v3-small
+
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+
+## Model Details
+
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 768 tokens
+- **Similarity Function:** Cosine Similarity
+
+- **Language:** en
+
+
+### Model Sources
+
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+
+### Full Model Architecture
+
+```
+SentenceTransformer(
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
+ (1): AdvancedWeightedPooling(
+ (linear_cls): Linear(in_features=768, out_features=768, bias=True)
+ (mha): MultiheadAttention(
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
+ )
+ (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
+ (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
+ )
+)
+```
+
+## Usage
+
+### Direct Usage (Sentence Transformers)
+
+First install the Sentence Transformers library:
+
+```bash
+pip install -U sentence-transformers
+```
+
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+
+# Download from the 🤗 Hub
+model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
+# Run inference
+sentences = [
+ 'What is the basic monetary unit of Iceland?',
+ "Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
+ 'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+
+
+
+
+
+
+
+## Evaluation
+
+### Metrics
+
+#### Semantic Similarity
+* Dataset: `sts-test`
+* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
+
+| Metric | Value |
+|:--------------------|:----------|
+| pearson_cosine | 0.3978 |
+| **spearman_cosine** | **0.443** |
+| pearson_manhattan | 0.4317 |
+| spearman_manhattan | 0.4554 |
+| pearson_euclidean | 0.4206 |
+| spearman_euclidean | 0.443 |
+| pearson_dot | 0.3974 |
+| spearman_dot | 0.4426 |
+| pearson_max | 0.4317 |
+| spearman_max | 0.4554 |
+
+#### Binary Classification
+* Dataset: `allNLI-dev`
+* Evaluated with [BinaryClassificationEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
+
+| Metric | Value |
+|:-----------------------------|:-----------|
+| cosine_accuracy | 0.668 |
+| cosine_accuracy_threshold | 0.9727 |
+| cosine_f1 | 0.5339 |
+| cosine_f1_threshold | 0.851 |
+| cosine_precision | 0.4214 |
+| cosine_recall | 0.7283 |
+| cosine_ap | 0.4444 |
+| dot_accuracy | 0.668 |
+| dot_accuracy_threshold | 747.4665 |
+| dot_f1 | 0.5347 |
+| dot_f1_threshold | 652.6122 |
+| dot_precision | 0.4205 |
+| dot_recall | 0.7341 |
+| dot_ap | 0.4447 |
+| manhattan_accuracy | 0.6738 |
+| manhattan_accuracy_threshold | 185.3549 |
+| manhattan_f1 | 0.5341 |
+| manhattan_f1_threshold | 316.4842 |
+| manhattan_precision | 0.3972 |
+| manhattan_recall | 0.815 |
+| manhattan_ap | 0.4533 |
+| euclidean_accuracy | 0.668 |
+| euclidean_accuracy_threshold | 6.4723 |
+| euclidean_f1 | 0.5339 |
+| euclidean_f1_threshold | 15.134 |
+| euclidean_precision | 0.4214 |
+| euclidean_recall | 0.7283 |
+| euclidean_ap | 0.4444 |
+| max_accuracy | 0.6738 |
+| max_accuracy_threshold | 747.4665 |
+| max_f1 | 0.5347 |
+| max_f1_threshold | 652.6122 |
+| max_precision | 0.4214 |
+| max_recall | 0.815 |
+| **max_ap** | **0.4533** |
+
+#### Binary Classification
+* Dataset: `Qnli-dev`
+* Evaluated with [BinaryClassificationEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
+
+| Metric | Value |
+|:-----------------------------|:-----------|
+| cosine_accuracy | 0.6602 |
+| cosine_accuracy_threshold | 0.8745 |
+| cosine_f1 | 0.6646 |
+| cosine_f1_threshold | 0.7533 |
+| cosine_precision | 0.5177 |
+| cosine_recall | 0.928 |
+| cosine_ap | 0.6611 |
+| dot_accuracy | 0.6602 |
+| dot_accuracy_threshold | 670.72 |
+| dot_f1 | 0.6646 |
+| dot_f1_threshold | 578.8748 |
+| dot_precision | 0.5177 |
+| dot_recall | 0.928 |
+| dot_ap | 0.6607 |
+| manhattan_accuracy | 0.666 |
+| manhattan_accuracy_threshold | 281.9825 |
+| manhattan_f1 | 0.6679 |
+| manhattan_f1_threshold | 328.8345 |
+| manhattan_precision | 0.589 |
+| manhattan_recall | 0.7712 |
+| manhattan_ap | 0.6664 |
+| euclidean_accuracy | 0.6602 |
+| euclidean_accuracy_threshold | 13.8815 |
+| euclidean_f1 | 0.6646 |
+| euclidean_f1_threshold | 19.4714 |
+| euclidean_precision | 0.5177 |
+| euclidean_recall | 0.928 |
+| euclidean_ap | 0.6611 |
+| max_accuracy | 0.666 |
+| max_accuracy_threshold | 670.72 |
+| max_f1 | 0.6679 |
+| max_f1_threshold | 578.8748 |
+| max_precision | 0.589 |
+| max_recall | 0.928 |
+| **max_ap** | **0.6664** |
+
+
+
+
+
+## Training Details
+
+### Evaluation Dataset
+
+#### vitaminc-pairs
+
+* Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
+* Size: 128 evaluation samples
+* Columns: claim
and evidence
+* Approximate statistics based on the first 128 samples:
+ | | claim | evidence |
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
+ | type | string | string |
+ | details |
Dragon Con had over 5000 guests .
| Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
|
+ | COVID-19 has reached more than 185 countries .
| As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
|
+ | In March , Italy had 3.6x times more cases of coronavirus than China .
| As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
|
+* Loss: [CachedGISTEmbedLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
+ ```json
+ {'guide': SentenceTransformer(
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+ (2): Normalize()
+ ), 'temperature': 0.025}
+ ```
+
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 42
+- `per_device_eval_batch_size`: 128
+- `gradient_accumulation_steps`: 2
+- `learning_rate`: 3e-05
+- `weight_decay`: 0.001
+- `lr_scheduler_type`: cosine_with_min_lr
+- `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1e-05}
+- `warmup_ratio`: 0.25
+- `save_safetensors`: False
+- `fp16`: True
+- `push_to_hub`: True
+- `hub_model_id`: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
+- `hub_strategy`: all_checkpoints
+- `batch_sampler`: no_duplicates
+
+#### All Hyperparameters
+