bobox
/

DeBERTaV3-small-GeneralSentenceTransformer-v2-AllSoft

loss=train_loss,
n_layers_per_step = -1,
last_layer_weight = 2,
prior_layers_weight= 0.1,
kl_div_weight = 0.5,
kl_temperature= 1,
)

learning_rate = 3e-5,
warmup_ratio=0.2,
weight_decay= 2e-6,
per_device_train_batch_size=28,

Files changed (2) hide show

README.md +9 -214
pytorch_model.bin +1 -1

README.md CHANGED Viewed

@@ -30,17 +30,6 @@ datasets:
 - sentence-transformers/trivia-qa
 - sentence-transformers/quora-duplicates
 - sentence-transformers/gooaq
-metrics:
-- pearson_cosine
-- spearman_cosine
-- pearson_manhattan
-- spearman_manhattan
-- pearson_euclidean
-- spearman_euclidean
-- pearson_dot
-- spearman_dot
-- pearson_max
-- spearman_max
 widget:
 - source_sentence: Centrosome-independent mitotic spindle formation in  vertebrates.
   sentences:
@@ -71,76 +60,6 @@ widget:
   - Only the series from 2009 onwards are available on Blu-ray, except for the 1970
     story Spearhead from Space, released in July 2013.
 pipeline_tag: sentence-similarity
-model-index:
-- name: SentenceTransformer based on microsoft/deberta-v3-small
-  results:
-  - task:
-      type: semantic-similarity
-      name: Semantic Similarity
-    dataset:
-      name: sts test
-      type: sts-test
-    metrics:
-    - type: pearson_cosine
-      value: 0.2520910673470529
-      name: Pearson Cosine
-    - type: spearman_cosine
-      value: 0.2588662067006675
-      name: Spearman Cosine
-    - type: pearson_manhattan
-      value: 0.30439718484055006
-      name: Pearson Manhattan
-    - type: spearman_manhattan
-      value: 0.3013780326567434
-      name: Spearman Manhattan
-    - type: pearson_euclidean
-      value: 0.25977707672353506
-      name: Pearson Euclidean
-    - type: spearman_euclidean
-      value: 0.26078444276128726
-      name: Spearman Euclidean
-    - type: pearson_dot
-      value: 0.08121075567918108
-      name: Pearson Dot
-    - type: spearman_dot
-      value: 0.0753891417253212
-      name: Spearman Dot
-    - type: pearson_max
-      value: 0.30439718484055006
-      name: Pearson Max
-    - type: spearman_max
-      value: 0.3013780326567434
-      name: Spearman Max
-    - type: pearson_cosine
-      value: 0.7223465841651426
-      name: Pearson Cosine
-    - type: spearman_cosine
-      value: 0.7063653140998242
-      name: Spearman Cosine
-    - type: pearson_manhattan
-      value: 0.7330735343178496
-      name: Pearson Manhattan
-    - type: spearman_manhattan
-      value: 0.7168204537336414
-      name: Spearman Manhattan
-    - type: pearson_euclidean
-      value: 0.7274789035011718
-      name: Pearson Euclidean
-    - type: spearman_euclidean
-      value: 0.7118592497365636
-      name: Spearman Euclidean
-    - type: pearson_dot
-      value: 0.6085268394963853
-      name: Pearson Dot
-    - type: spearman_dot
-      value: 0.5825572353571885
-      name: Spearman Dot
-    - type: pearson_max
-      value: 0.7330735343178496
-      name: Pearson Max
-    - type: spearman_max
-      value: 0.7168204537336414
-      name: Spearman Max
 ---
 # SentenceTransformer based on microsoft/deberta-v3-small
@@ -246,44 +165,6 @@ You can finetune this model on your own dataset.
 *List how the model may foreseeably be misused and address what users ought not to do with the model.*
 -->
-## Evaluation
-### Metrics
-#### Semantic Similarity
-* Dataset: `sts-test`
-* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
-| Metric              | Value      |
-|:--------------------|:-----------|
-| pearson_cosine      | 0.2521     |
-| **spearman_cosine** | **0.2589** |
-| pearson_manhattan   | 0.3044     |
-| spearman_manhattan  | 0.3014     |
-| pearson_euclidean   | 0.2598     |
-| spearman_euclidean  | 0.2608     |
-| pearson_dot         | 0.0812     |
-| spearman_dot        | 0.0754     |
-| pearson_max         | 0.3044     |
-| spearman_max        | 0.3014     |
-#### Semantic Similarity
-* Dataset: `sts-test`
-* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
-| Metric              | Value      |
-|:--------------------|:-----------|
-| pearson_cosine      | 0.7223     |
-| **spearman_cosine** | **0.7064** |
-| pearson_manhattan   | 0.7331     |
-| spearman_manhattan  | 0.7168     |
-| pearson_euclidean   | 0.7275     |
-| spearman_euclidean  | 0.7119     |
-| pearson_dot         | 0.6085     |
-| spearman_dot        | 0.5826     |
-| pearson_max         | 0.7331     |
-| spearman_max        | 0.7168     |
 <!--
 ## Bias, Risks and Limitations
@@ -836,19 +717,18 @@ You can finetune this model on your own dataset.
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 28
 - `per_device_eval_batch_size`: 16
-- `learning_rate`: 2e-05
 - `weight_decay`: 1e-06
 - `num_train_epochs`: 1
 - `lr_scheduler_type`: cosine_with_restarts
-- `lr_scheduler_kwargs`: {'num_cycles': 4}
-- `warmup_ratio`: 0.1
 - `save_safetensors`: False
 - `fp16`: True
 - `push_to_hub`: True
-- `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2-2-checkpoints-tmp
 - `hub_strategy`: checkpoint
 - `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: round_robin
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
@@ -863,7 +743,7 @@ You can finetune this model on your own dataset.
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
-- `learning_rate`: 2e-05
 - `weight_decay`: 1e-06
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
@@ -872,8 +752,8 @@ You can finetune this model on your own dataset.
 - `num_train_epochs`: 1
 - `max_steps`: -1
 - `lr_scheduler_type`: cosine_with_restarts
-- `lr_scheduler_kwargs`: {'num_cycles': 4}
-- `warmup_ratio`: 0.1
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
@@ -932,7 +812,7 @@ You can finetune this model on your own dataset.
 - `use_legacy_prediction_loop`: False
 - `push_to_hub`: True
 - `resume_from_checkpoint`: None
-- `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2-2-checkpoints-tmp
 - `hub_strategy`: checkpoint
 - `hub_private_repo`: False
 - `hub_always_push`: False
@@ -960,95 +840,10 @@ You can finetune this model on your own dataset.
 - `optim_target_modules`: None
 - `batch_eval_metrics`: False
 - `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: round_robin
 </details>
-### Training Logs
-| Epoch  | Step | Training Loss | scitail-pairs-pos loss | nli-pairs loss | qnli-contrastive loss | sts-test_spearman_cosine |
-|:------:|:----:|:-------------:|:----------------------:|:--------------:|:---------------------:|:------------------------:|
-| 0      | 0    | -             | 8.0040                 | 9.8177         | 9.4806                | 0.2589                   |
-| 0.0128 | 20   | 11.4777       | -                      | -              | -                     | -                        |
-| 0.0255 | 40   | 10.5191       | 7.2875                 | 8.8852         | 8.8655                | -                        |
-| 0.0383 | 60   | 10.353        | -                      | -              | -                     | -                        |
-| 0.0510 | 80   | 9.3633        | 6.7486                 | 8.0722         | 8.6831                | -                        |
-| 0.0638 | 100  | 8.5311        | -                      | -              | -                     | -                        |
-| 0.0765 | 120  | 7.9197        | 5.6812                 | 7.2969         | 7.8228                | -                        |
-| 0.0893 | 140  | 7.9156        | -                      | -              | -                     | -                        |
-| 0.1020 | 160  | 6.9534        | 3.7800                 | 5.6062         | 6.6090                | -                        |
-| 0.1148 | 180  | 6.2044        | -                      | -              | -                     | -                        |
-| 0.1276 | 200  | 5.1684        | 3.0161                 | 4.5109         | 6.2355                | -                        |
-| 0.1403 | 220  | 5.6775        | -                      | -              | -                     | -                        |
-| 0.1531 | 240  | 4.7883        | 2.6949                 | 3.9360         | 5.9163                | -                        |
-| 0.1658 | 260  | 5.0748        | -                      | -              | -                     | -                        |
-| 0.1786 | 280  | 3.9346        | 2.3893                 | 3.5354         | 5.5410                | -                        |
-| 0.1913 | 300  | 4.5773        | -                      | -              | -                     | -                        |
-| 0.2041 | 320  | 4.4597        | 2.1959                 | 3.3361         | 5.5035                | -                        |
-| 0.2168 | 340  | 4.3078        | -                      | -              | -                     | -                        |
-| 0.2296 | 360  | 3.6586        | 2.0835                 | 3.1664         | 5.2422                | -                        |
-| 0.2423 | 380  | 4.4502        | -                      | -              | -                     | -                        |
-| 0.2551 | 400  | 4.4145        | 2.0453                 | 3.0487         | 5.0860                | -                        |
-| 0.2679 | 420  | 4.1619        | -                      | -              | -                     | -                        |
-| 0.2806 | 440  | 3.3511        | 2.0120                 | 3.0028         | 5.0385                | -                        |
-| 0.2934 | 460  | 4.0384        | -                      | -              | -                     | -                        |
-| 0.3061 | 480  | 3.9362        | 2.0152                 | 2.9886         | 5.0042                | -                        |
-| 0.3189 | 500  | 4.0973        | -                      | -              | -                     | -                        |
-| 0.3316 | 520  | 3.6556        | 2.0625                 | 2.9593         | 4.8619                | -                        |
-| 0.3444 | 540  | 4.2737        | -                      | -              | -                     | -                        |
-| 0.3571 | 560  | 4.1839        | 1.9112                 | 2.7910         | 4.9692                | -                        |
-| 0.3699 | 580  | 4.3264        | -                      | -              | -                     | -                        |
-| 0.3827 | 600  | 3.3527        | 1.9737                 | 2.7037         | 4.5847                | -                        |
-| 0.3954 | 620  | 4.2181        | -                      | -              | -                     | -                        |
-| 0.4082 | 640  | 3.5893        | 1.8670                 | 2.6520         | 4.3501                | -                        |
-| 0.4209 | 660  | 4.079         | -                      | -              | -                     | -                        |
-| 0.4337 | 680  | 3.2008        | 1.7706                 | 2.5075         | 4.3152                | -                        |
-| 0.4464 | 700  | 3.8099        | -                      | -              | -                     | -                        |
-| 0.4592 | 720  | 3.7107        | 1.7636                 | 2.4822         | 4.2869                | -                        |
-| 0.4719 | 740  | 3.6373        | -                      | -              | -                     | -                        |
-| 0.4847 | 760  | 2.9392        | 1.7979                 | 2.4399         | 4.0709                | -                        |
-| 0.4974 | 780  | 3.5176        | -                      | -              | -                     | -                        |
-| 0.5102 | 800  | 3.1213        | 1.7984                 | 2.4025         | 4.0300                | -                        |
-| 0.5230 | 820  | 3.398         | -                      | -              | -                     | -                        |
-| 0.5357 | 840  | 3.2391        | 1.7791                 | 2.3943         | 4.0008                | -                        |
-| 0.5485 | 860  | 3.8409        | -                      | -              | -                     | -                        |
-| 0.5612 | 880  | 3.6284        | 1.8481                 | 2.4182         | 4.0416                | -                        |
-| 0.5740 | 900  | 3.7306        | -                      | -              | -                     | -                        |
-| 0.5867 | 920  | 3.1185        | 1.7738                 | 2.3713         | 4.0205                | -                        |
-| 0.5995 | 940  | 3.7463        | -                      | -              | -                     | -                        |
-| 0.6122 | 960  | 3.2824        | 1.6536                 | 2.3497         | 3.8148                | -                        |
-| 0.625  | 980  | 3.6101        | -                      | -              | -                     | -                        |
-| 0.6378 | 1000 | 2.9711        | 1.6454                 | 2.3568         | 3.8989                | -                        |
-| 0.6505 | 1020 | 3.2722        | -                      | -              | -                     | -                        |
-| 0.6633 | 1040 | 3.6219        | 1.5998                 | 2.2327         | 3.7404                | -                        |
-| 0.6760 | 1060 | 3.2834        | -                      | -              | -                     | -                        |
-| 0.6888 | 1080 | 2.6537        | 1.5892                 | 2.2147         | 3.5766                | -                        |
-| 0.7015 | 1100 | 3.9296        | -                      | -              | -                     | -                        |
-| 0.7143 | 1120 | 3.2404        | 1.5405                 | 2.1900         | 3.5694                | -                        |
-| 0.7270 | 1140 | 3.6267        | -                      | -              | -                     | -                        |
-| 0.7398 | 1160 | 2.8854        | 1.5526                 | 2.1545         | 3.4772                | -                        |
-| 0.7526 | 1180 | 3.5955        | -                      | -              | -                     | -                        |
-| 0.7653 | 1200 | 3.0224        | 1.5545                 | 2.1478         | 3.4740                | -                        |
-| 0.7781 | 1220 | 3.4601        | -                      | -              | -                     | -                        |
-| 0.7908 | 1240 | 2.9627        | 1.5409                 | 2.1404         | 3.3743                | -                        |
-| 0.8036 | 1260 | 3.4228        | -                      | -              | -                     | -                        |
-| 0.8163 | 1280 | 3.2091        | 1.5056                 | 2.1527         | 3.3251                | -                        |
-| 0.8291 | 1300 | 3.544         | -                      | -              | -                     | -                        |
-| 0.8418 | 1320 | 2.6495        | 1.5841                 | 2.0985         | 3.4278                | -                        |
-| 0.8546 | 1340 | 3.4667        | -                      | -              | -                     | -                        |
-| 0.8673 | 1360 | 3.073         | 1.5703                 | 2.0951         | 3.2430                | -                        |
-| 0.8801 | 1380 | 3.5249        | -                      | -              | -                     | -                        |
-| 0.8929 | 1400 | 2.6768        | 1.4969                 | 2.0449         | 3.3015                | -                        |
-| 0.9056 | 1420 | 3.3377        | -                      | -              | -                     | -                        |
-| 0.9184 | 1440 | 2.7483        | 1.4881                 | 2.0088         | 3.2689                | -                        |
-| 0.9311 | 1460 | 3.5008        | -                      | -              | -                     | -                        |
-| 0.9439 | 1480 | 2.6245        | 1.4691                 | 1.9812         | 3.1827                | -                        |
-| 0.9566 | 1500 | 3.4935        | -                      | -              | -                     | -                        |
-| 0.9694 | 1520 | 2.7245        | 1.4688                 | 1.9725         | 3.1812                | -                        |
-| 0.9821 | 1540 | 3.0919        | -                      | -              | -                     | -                        |
-| 0.9949 | 1560 | 2.5899        | 1.4648                 | 1.9657         | 3.1510                | -                        |
-| 1.0    | 1568 | -             | 1.4647                 | 1.9657         | 3.1510                | 0.7064                   |
 ### Framework Versions
 - Python: 3.10.13
 - Sentence Transformers: 3.0.1

 - sentence-transformers/trivia-qa
 - sentence-transformers/quora-duplicates
 - sentence-transformers/gooaq
 widget:
 - source_sentence: Centrosome-independent mitotic spindle formation in  vertebrates.
   sentences:
   - Only the series from 2009 onwards are available on Blu-ray, except for the 1970
     story Spearhead from Space, released in July 2013.
 pipeline_tag: sentence-similarity
 ---
 # SentenceTransformer based on microsoft/deberta-v3-small
 *List how the model may foreseeably be misused and address what users ought not to do with the model.*
 -->
 <!--
 ## Bias, Risks and Limitations
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 28
 - `per_device_eval_batch_size`: 16
+- `learning_rate`: 3e-05
 - `weight_decay`: 1e-06
 - `num_train_epochs`: 1
 - `lr_scheduler_type`: cosine_with_restarts
+- `lr_scheduler_kwargs`: {'num_cycles': 3}
+- `warmup_ratio`: 0.2
 - `save_safetensors`: False
 - `fp16`: True
 - `push_to_hub`: True
+- `hub_model_id`: bobox/DeBERTaV3-small-SenTra-AdaptiveLayers-AllSoft-HighTemp-n
 - `hub_strategy`: checkpoint
 - `batch_sampler`: no_duplicates
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
+- `learning_rate`: 3e-05
 - `weight_decay`: 1e-06
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `num_train_epochs`: 1
 - `max_steps`: -1
 - `lr_scheduler_type`: cosine_with_restarts
+- `lr_scheduler_kwargs`: {'num_cycles': 3}
+- `warmup_ratio`: 0.2
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
 - `use_legacy_prediction_loop`: False
 - `push_to_hub`: True
 - `resume_from_checkpoint`: None
+- `hub_model_id`: bobox/DeBERTaV3-small-SenTra-AdaptiveLayers-AllSoft-HighTemp-n
 - `hub_strategy`: checkpoint
 - `hub_private_repo`: False
 - `hub_always_push`: False
 - `optim_target_modules`: None
 - `batch_eval_metrics`: False
 - `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
 </details>
 ### Framework Versions
 - Python: 3.10.13
 - Sentence Transformers: 3.0.1

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1b3b7926361b58a21b953a72858a524b11fa08fd21c653767a0b9e2847095455
 size 565251810

 version https://git-lfs.github.com/spec/v1
+oid sha256:e75f9f0d0ccf1ea68d57e5e49eadbe854516a7a239c28fe45742d13c727c0aae
 size 565251810