bobox commited on
Commit
ca0ee78
1 Parent(s): 8cfc6ec

all layer trained for every step.AdaptiveLayerLoss(model=model,

Browse files

loss=train_loss,
n_layers_per_step = -1,
last_layer_weight = 2,
prior_layers_weight= 0.1,
kl_div_weight = 0.5,
kl_temperature= 1,
)

learning_rate = 3e-5,
warmup_ratio=0.2,
weight_decay= 2e-6,
per_device_train_batch_size=28,

Files changed (2) hide show
  1. README.md +9 -214
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -30,17 +30,6 @@ datasets:
30
  - sentence-transformers/trivia-qa
31
  - sentence-transformers/quora-duplicates
32
  - sentence-transformers/gooaq
33
- metrics:
34
- - pearson_cosine
35
- - spearman_cosine
36
- - pearson_manhattan
37
- - spearman_manhattan
38
- - pearson_euclidean
39
- - spearman_euclidean
40
- - pearson_dot
41
- - spearman_dot
42
- - pearson_max
43
- - spearman_max
44
  widget:
45
  - source_sentence: Centrosome-independent mitotic spindle formation in vertebrates.
46
  sentences:
@@ -71,76 +60,6 @@ widget:
71
  - Only the series from 2009 onwards are available on Blu-ray, except for the 1970
72
  story Spearhead from Space, released in July 2013.
73
  pipeline_tag: sentence-similarity
74
- model-index:
75
- - name: SentenceTransformer based on microsoft/deberta-v3-small
76
- results:
77
- - task:
78
- type: semantic-similarity
79
- name: Semantic Similarity
80
- dataset:
81
- name: sts test
82
- type: sts-test
83
- metrics:
84
- - type: pearson_cosine
85
- value: 0.2520910673470529
86
- name: Pearson Cosine
87
- - type: spearman_cosine
88
- value: 0.2588662067006675
89
- name: Spearman Cosine
90
- - type: pearson_manhattan
91
- value: 0.30439718484055006
92
- name: Pearson Manhattan
93
- - type: spearman_manhattan
94
- value: 0.3013780326567434
95
- name: Spearman Manhattan
96
- - type: pearson_euclidean
97
- value: 0.25977707672353506
98
- name: Pearson Euclidean
99
- - type: spearman_euclidean
100
- value: 0.26078444276128726
101
- name: Spearman Euclidean
102
- - type: pearson_dot
103
- value: 0.08121075567918108
104
- name: Pearson Dot
105
- - type: spearman_dot
106
- value: 0.0753891417253212
107
- name: Spearman Dot
108
- - type: pearson_max
109
- value: 0.30439718484055006
110
- name: Pearson Max
111
- - type: spearman_max
112
- value: 0.3013780326567434
113
- name: Spearman Max
114
- - type: pearson_cosine
115
- value: 0.7223465841651426
116
- name: Pearson Cosine
117
- - type: spearman_cosine
118
- value: 0.7063653140998242
119
- name: Spearman Cosine
120
- - type: pearson_manhattan
121
- value: 0.7330735343178496
122
- name: Pearson Manhattan
123
- - type: spearman_manhattan
124
- value: 0.7168204537336414
125
- name: Spearman Manhattan
126
- - type: pearson_euclidean
127
- value: 0.7274789035011718
128
- name: Pearson Euclidean
129
- - type: spearman_euclidean
130
- value: 0.7118592497365636
131
- name: Spearman Euclidean
132
- - type: pearson_dot
133
- value: 0.6085268394963853
134
- name: Pearson Dot
135
- - type: spearman_dot
136
- value: 0.5825572353571885
137
- name: Spearman Dot
138
- - type: pearson_max
139
- value: 0.7330735343178496
140
- name: Pearson Max
141
- - type: spearman_max
142
- value: 0.7168204537336414
143
- name: Spearman Max
144
  ---
145
 
146
  # SentenceTransformer based on microsoft/deberta-v3-small
@@ -246,44 +165,6 @@ You can finetune this model on your own dataset.
246
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
247
  -->
248
 
249
- ## Evaluation
250
-
251
- ### Metrics
252
-
253
- #### Semantic Similarity
254
- * Dataset: `sts-test`
255
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
256
-
257
- | Metric | Value |
258
- |:--------------------|:-----------|
259
- | pearson_cosine | 0.2521 |
260
- | **spearman_cosine** | **0.2589** |
261
- | pearson_manhattan | 0.3044 |
262
- | spearman_manhattan | 0.3014 |
263
- | pearson_euclidean | 0.2598 |
264
- | spearman_euclidean | 0.2608 |
265
- | pearson_dot | 0.0812 |
266
- | spearman_dot | 0.0754 |
267
- | pearson_max | 0.3044 |
268
- | spearman_max | 0.3014 |
269
-
270
- #### Semantic Similarity
271
- * Dataset: `sts-test`
272
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
273
-
274
- | Metric | Value |
275
- |:--------------------|:-----------|
276
- | pearson_cosine | 0.7223 |
277
- | **spearman_cosine** | **0.7064** |
278
- | pearson_manhattan | 0.7331 |
279
- | spearman_manhattan | 0.7168 |
280
- | pearson_euclidean | 0.7275 |
281
- | spearman_euclidean | 0.7119 |
282
- | pearson_dot | 0.6085 |
283
- | spearman_dot | 0.5826 |
284
- | pearson_max | 0.7331 |
285
- | spearman_max | 0.7168 |
286
-
287
  <!--
288
  ## Bias, Risks and Limitations
289
 
@@ -836,19 +717,18 @@ You can finetune this model on your own dataset.
836
  - `eval_strategy`: steps
837
  - `per_device_train_batch_size`: 28
838
  - `per_device_eval_batch_size`: 16
839
- - `learning_rate`: 2e-05
840
  - `weight_decay`: 1e-06
841
  - `num_train_epochs`: 1
842
  - `lr_scheduler_type`: cosine_with_restarts
843
- - `lr_scheduler_kwargs`: {'num_cycles': 4}
844
- - `warmup_ratio`: 0.1
845
  - `save_safetensors`: False
846
  - `fp16`: True
847
  - `push_to_hub`: True
848
- - `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2-2-checkpoints-tmp
849
  - `hub_strategy`: checkpoint
850
  - `batch_sampler`: no_duplicates
851
- - `multi_dataset_batch_sampler`: round_robin
852
 
853
  #### All Hyperparameters
854
  <details><summary>Click to expand</summary>
@@ -863,7 +743,7 @@ You can finetune this model on your own dataset.
863
  - `per_gpu_eval_batch_size`: None
864
  - `gradient_accumulation_steps`: 1
865
  - `eval_accumulation_steps`: None
866
- - `learning_rate`: 2e-05
867
  - `weight_decay`: 1e-06
868
  - `adam_beta1`: 0.9
869
  - `adam_beta2`: 0.999
@@ -872,8 +752,8 @@ You can finetune this model on your own dataset.
872
  - `num_train_epochs`: 1
873
  - `max_steps`: -1
874
  - `lr_scheduler_type`: cosine_with_restarts
875
- - `lr_scheduler_kwargs`: {'num_cycles': 4}
876
- - `warmup_ratio`: 0.1
877
  - `warmup_steps`: 0
878
  - `log_level`: passive
879
  - `log_level_replica`: warning
@@ -932,7 +812,7 @@ You can finetune this model on your own dataset.
932
  - `use_legacy_prediction_loop`: False
933
  - `push_to_hub`: True
934
  - `resume_from_checkpoint`: None
935
- - `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2-2-checkpoints-tmp
936
  - `hub_strategy`: checkpoint
937
  - `hub_private_repo`: False
938
  - `hub_always_push`: False
@@ -960,95 +840,10 @@ You can finetune this model on your own dataset.
960
  - `optim_target_modules`: None
961
  - `batch_eval_metrics`: False
962
  - `batch_sampler`: no_duplicates
963
- - `multi_dataset_batch_sampler`: round_robin
964
 
965
  </details>
966
 
967
- ### Training Logs
968
- | Epoch | Step | Training Loss | scitail-pairs-pos loss | nli-pairs loss | qnli-contrastive loss | sts-test_spearman_cosine |
969
- |:------:|:----:|:-------------:|:----------------------:|:--------------:|:---------------------:|:------------------------:|
970
- | 0 | 0 | - | 8.0040 | 9.8177 | 9.4806 | 0.2589 |
971
- | 0.0128 | 20 | 11.4777 | - | - | - | - |
972
- | 0.0255 | 40 | 10.5191 | 7.2875 | 8.8852 | 8.8655 | - |
973
- | 0.0383 | 60 | 10.353 | - | - | - | - |
974
- | 0.0510 | 80 | 9.3633 | 6.7486 | 8.0722 | 8.6831 | - |
975
- | 0.0638 | 100 | 8.5311 | - | - | - | - |
976
- | 0.0765 | 120 | 7.9197 | 5.6812 | 7.2969 | 7.8228 | - |
977
- | 0.0893 | 140 | 7.9156 | - | - | - | - |
978
- | 0.1020 | 160 | 6.9534 | 3.7800 | 5.6062 | 6.6090 | - |
979
- | 0.1148 | 180 | 6.2044 | - | - | - | - |
980
- | 0.1276 | 200 | 5.1684 | 3.0161 | 4.5109 | 6.2355 | - |
981
- | 0.1403 | 220 | 5.6775 | - | - | - | - |
982
- | 0.1531 | 240 | 4.7883 | 2.6949 | 3.9360 | 5.9163 | - |
983
- | 0.1658 | 260 | 5.0748 | - | - | - | - |
984
- | 0.1786 | 280 | 3.9346 | 2.3893 | 3.5354 | 5.5410 | - |
985
- | 0.1913 | 300 | 4.5773 | - | - | - | - |
986
- | 0.2041 | 320 | 4.4597 | 2.1959 | 3.3361 | 5.5035 | - |
987
- | 0.2168 | 340 | 4.3078 | - | - | - | - |
988
- | 0.2296 | 360 | 3.6586 | 2.0835 | 3.1664 | 5.2422 | - |
989
- | 0.2423 | 380 | 4.4502 | - | - | - | - |
990
- | 0.2551 | 400 | 4.4145 | 2.0453 | 3.0487 | 5.0860 | - |
991
- | 0.2679 | 420 | 4.1619 | - | - | - | - |
992
- | 0.2806 | 440 | 3.3511 | 2.0120 | 3.0028 | 5.0385 | - |
993
- | 0.2934 | 460 | 4.0384 | - | - | - | - |
994
- | 0.3061 | 480 | 3.9362 | 2.0152 | 2.9886 | 5.0042 | - |
995
- | 0.3189 | 500 | 4.0973 | - | - | - | - |
996
- | 0.3316 | 520 | 3.6556 | 2.0625 | 2.9593 | 4.8619 | - |
997
- | 0.3444 | 540 | 4.2737 | - | - | - | - |
998
- | 0.3571 | 560 | 4.1839 | 1.9112 | 2.7910 | 4.9692 | - |
999
- | 0.3699 | 580 | 4.3264 | - | - | - | - |
1000
- | 0.3827 | 600 | 3.3527 | 1.9737 | 2.7037 | 4.5847 | - |
1001
- | 0.3954 | 620 | 4.2181 | - | - | - | - |
1002
- | 0.4082 | 640 | 3.5893 | 1.8670 | 2.6520 | 4.3501 | - |
1003
- | 0.4209 | 660 | 4.079 | - | - | - | - |
1004
- | 0.4337 | 680 | 3.2008 | 1.7706 | 2.5075 | 4.3152 | - |
1005
- | 0.4464 | 700 | 3.8099 | - | - | - | - |
1006
- | 0.4592 | 720 | 3.7107 | 1.7636 | 2.4822 | 4.2869 | - |
1007
- | 0.4719 | 740 | 3.6373 | - | - | - | - |
1008
- | 0.4847 | 760 | 2.9392 | 1.7979 | 2.4399 | 4.0709 | - |
1009
- | 0.4974 | 780 | 3.5176 | - | - | - | - |
1010
- | 0.5102 | 800 | 3.1213 | 1.7984 | 2.4025 | 4.0300 | - |
1011
- | 0.5230 | 820 | 3.398 | - | - | - | - |
1012
- | 0.5357 | 840 | 3.2391 | 1.7791 | 2.3943 | 4.0008 | - |
1013
- | 0.5485 | 860 | 3.8409 | - | - | - | - |
1014
- | 0.5612 | 880 | 3.6284 | 1.8481 | 2.4182 | 4.0416 | - |
1015
- | 0.5740 | 900 | 3.7306 | - | - | - | - |
1016
- | 0.5867 | 920 | 3.1185 | 1.7738 | 2.3713 | 4.0205 | - |
1017
- | 0.5995 | 940 | 3.7463 | - | - | - | - |
1018
- | 0.6122 | 960 | 3.2824 | 1.6536 | 2.3497 | 3.8148 | - |
1019
- | 0.625 | 980 | 3.6101 | - | - | - | - |
1020
- | 0.6378 | 1000 | 2.9711 | 1.6454 | 2.3568 | 3.8989 | - |
1021
- | 0.6505 | 1020 | 3.2722 | - | - | - | - |
1022
- | 0.6633 | 1040 | 3.6219 | 1.5998 | 2.2327 | 3.7404 | - |
1023
- | 0.6760 | 1060 | 3.2834 | - | - | - | - |
1024
- | 0.6888 | 1080 | 2.6537 | 1.5892 | 2.2147 | 3.5766 | - |
1025
- | 0.7015 | 1100 | 3.9296 | - | - | - | - |
1026
- | 0.7143 | 1120 | 3.2404 | 1.5405 | 2.1900 | 3.5694 | - |
1027
- | 0.7270 | 1140 | 3.6267 | - | - | - | - |
1028
- | 0.7398 | 1160 | 2.8854 | 1.5526 | 2.1545 | 3.4772 | - |
1029
- | 0.7526 | 1180 | 3.5955 | - | - | - | - |
1030
- | 0.7653 | 1200 | 3.0224 | 1.5545 | 2.1478 | 3.4740 | - |
1031
- | 0.7781 | 1220 | 3.4601 | - | - | - | - |
1032
- | 0.7908 | 1240 | 2.9627 | 1.5409 | 2.1404 | 3.3743 | - |
1033
- | 0.8036 | 1260 | 3.4228 | - | - | - | - |
1034
- | 0.8163 | 1280 | 3.2091 | 1.5056 | 2.1527 | 3.3251 | - |
1035
- | 0.8291 | 1300 | 3.544 | - | - | - | - |
1036
- | 0.8418 | 1320 | 2.6495 | 1.5841 | 2.0985 | 3.4278 | - |
1037
- | 0.8546 | 1340 | 3.4667 | - | - | - | - |
1038
- | 0.8673 | 1360 | 3.073 | 1.5703 | 2.0951 | 3.2430 | - |
1039
- | 0.8801 | 1380 | 3.5249 | - | - | - | - |
1040
- | 0.8929 | 1400 | 2.6768 | 1.4969 | 2.0449 | 3.3015 | - |
1041
- | 0.9056 | 1420 | 3.3377 | - | - | - | - |
1042
- | 0.9184 | 1440 | 2.7483 | 1.4881 | 2.0088 | 3.2689 | - |
1043
- | 0.9311 | 1460 | 3.5008 | - | - | - | - |
1044
- | 0.9439 | 1480 | 2.6245 | 1.4691 | 1.9812 | 3.1827 | - |
1045
- | 0.9566 | 1500 | 3.4935 | - | - | - | - |
1046
- | 0.9694 | 1520 | 2.7245 | 1.4688 | 1.9725 | 3.1812 | - |
1047
- | 0.9821 | 1540 | 3.0919 | - | - | - | - |
1048
- | 0.9949 | 1560 | 2.5899 | 1.4648 | 1.9657 | 3.1510 | - |
1049
- | 1.0 | 1568 | - | 1.4647 | 1.9657 | 3.1510 | 0.7064 |
1050
-
1051
-
1052
  ### Framework Versions
1053
  - Python: 3.10.13
1054
  - Sentence Transformers: 3.0.1
 
30
  - sentence-transformers/trivia-qa
31
  - sentence-transformers/quora-duplicates
32
  - sentence-transformers/gooaq
 
 
 
 
 
 
 
 
 
 
 
33
  widget:
34
  - source_sentence: Centrosome-independent mitotic spindle formation in vertebrates.
35
  sentences:
 
60
  - Only the series from 2009 onwards are available on Blu-ray, except for the 1970
61
  story Spearhead from Space, released in July 2013.
62
  pipeline_tag: sentence-similarity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ---
64
 
65
  # SentenceTransformer based on microsoft/deberta-v3-small
 
165
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
166
  -->
167
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
  <!--
169
  ## Bias, Risks and Limitations
170
 
 
717
  - `eval_strategy`: steps
718
  - `per_device_train_batch_size`: 28
719
  - `per_device_eval_batch_size`: 16
720
+ - `learning_rate`: 3e-05
721
  - `weight_decay`: 1e-06
722
  - `num_train_epochs`: 1
723
  - `lr_scheduler_type`: cosine_with_restarts
724
+ - `lr_scheduler_kwargs`: {'num_cycles': 3}
725
+ - `warmup_ratio`: 0.2
726
  - `save_safetensors`: False
727
  - `fp16`: True
728
  - `push_to_hub`: True
729
+ - `hub_model_id`: bobox/DeBERTaV3-small-SenTra-AdaptiveLayers-AllSoft-HighTemp-n
730
  - `hub_strategy`: checkpoint
731
  - `batch_sampler`: no_duplicates
 
732
 
733
  #### All Hyperparameters
734
  <details><summary>Click to expand</summary>
 
743
  - `per_gpu_eval_batch_size`: None
744
  - `gradient_accumulation_steps`: 1
745
  - `eval_accumulation_steps`: None
746
+ - `learning_rate`: 3e-05
747
  - `weight_decay`: 1e-06
748
  - `adam_beta1`: 0.9
749
  - `adam_beta2`: 0.999
 
752
  - `num_train_epochs`: 1
753
  - `max_steps`: -1
754
  - `lr_scheduler_type`: cosine_with_restarts
755
+ - `lr_scheduler_kwargs`: {'num_cycles': 3}
756
+ - `warmup_ratio`: 0.2
757
  - `warmup_steps`: 0
758
  - `log_level`: passive
759
  - `log_level_replica`: warning
 
812
  - `use_legacy_prediction_loop`: False
813
  - `push_to_hub`: True
814
  - `resume_from_checkpoint`: None
815
+ - `hub_model_id`: bobox/DeBERTaV3-small-SenTra-AdaptiveLayers-AllSoft-HighTemp-n
816
  - `hub_strategy`: checkpoint
817
  - `hub_private_repo`: False
818
  - `hub_always_push`: False
 
840
  - `optim_target_modules`: None
841
  - `batch_eval_metrics`: False
842
  - `batch_sampler`: no_duplicates
843
+ - `multi_dataset_batch_sampler`: proportional
844
 
845
  </details>
846
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
847
  ### Framework Versions
848
  - Python: 3.10.13
849
  - Sentence Transformers: 3.0.1
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1b3b7926361b58a21b953a72858a524b11fa08fd21c653767a0b9e2847095455
3
  size 565251810
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e75f9f0d0ccf1ea68d57e5e49eadbe854516a7a239c28fe45742d13c727c0aae
3
  size 565251810