|
--- |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
- mteb |
|
license: lgpl |
|
language: |
|
- pl |
|
pipeline_tag: sentence-similarity |
|
model-index: |
|
- name: st-polish-kartonberta-base-alpha-v1 |
|
results: |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: PL-MTEB/8tags-clustering |
|
name: MTEB 8TagsClustering |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 32.85180358455615 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/allegro-reviews |
|
name: MTEB AllegroReviews |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 40.188866799204774 |
|
- type: f1 |
|
value: 34.71127012684797 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: arguana-pl |
|
name: MTEB ArguAna-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 30.939 |
|
- type: map_at_10 |
|
value: 47.467999999999996 |
|
- type: map_at_100 |
|
value: 48.303000000000004 |
|
- type: map_at_1000 |
|
value: 48.308 |
|
- type: map_at_3 |
|
value: 43.22 |
|
- type: map_at_5 |
|
value: 45.616 |
|
- type: mrr_at_1 |
|
value: 31.863000000000003 |
|
- type: mrr_at_10 |
|
value: 47.829 |
|
- type: mrr_at_100 |
|
value: 48.664 |
|
- type: mrr_at_1000 |
|
value: 48.67 |
|
- type: mrr_at_3 |
|
value: 43.492 |
|
- type: mrr_at_5 |
|
value: 46.006 |
|
- type: ndcg_at_1 |
|
value: 30.939 |
|
- type: ndcg_at_10 |
|
value: 56.058 |
|
- type: ndcg_at_100 |
|
value: 59.562000000000005 |
|
- type: ndcg_at_1000 |
|
value: 59.69799999999999 |
|
- type: ndcg_at_3 |
|
value: 47.260000000000005 |
|
- type: ndcg_at_5 |
|
value: 51.587 |
|
- type: precision_at_1 |
|
value: 30.939 |
|
- type: precision_at_10 |
|
value: 8.329 |
|
- type: precision_at_100 |
|
value: 0.984 |
|
- type: precision_at_1000 |
|
value: 0.1 |
|
- type: precision_at_3 |
|
value: 19.654 |
|
- type: precision_at_5 |
|
value: 13.898 |
|
- type: recall_at_1 |
|
value: 30.939 |
|
- type: recall_at_10 |
|
value: 83.286 |
|
- type: recall_at_100 |
|
value: 98.43499999999999 |
|
- type: recall_at_1000 |
|
value: 99.502 |
|
- type: recall_at_3 |
|
value: 58.962 |
|
- type: recall_at_5 |
|
value: 69.488 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/cbd |
|
name: MTEB CBD |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 67.69000000000001 |
|
- type: ap |
|
value: 21.078799692467182 |
|
- type: f1 |
|
value: 56.80107173953953 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/cdsce-pairclassification |
|
name: MTEB CDSC-E |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 89.2 |
|
- type: cos_sim_ap |
|
value: 79.11674608786898 |
|
- type: cos_sim_f1 |
|
value: 68.83468834688347 |
|
- type: cos_sim_precision |
|
value: 70.94972067039106 |
|
- type: cos_sim_recall |
|
value: 66.84210526315789 |
|
- type: dot_accuracy |
|
value: 89.2 |
|
- type: dot_ap |
|
value: 79.11674608786898 |
|
- type: dot_f1 |
|
value: 68.83468834688347 |
|
- type: dot_precision |
|
value: 70.94972067039106 |
|
- type: dot_recall |
|
value: 66.84210526315789 |
|
- type: euclidean_accuracy |
|
value: 89.2 |
|
- type: euclidean_ap |
|
value: 79.11674608786898 |
|
- type: euclidean_f1 |
|
value: 68.83468834688347 |
|
- type: euclidean_precision |
|
value: 70.94972067039106 |
|
- type: euclidean_recall |
|
value: 66.84210526315789 |
|
- type: manhattan_accuracy |
|
value: 89.1 |
|
- type: manhattan_ap |
|
value: 79.1220443374692 |
|
- type: manhattan_f1 |
|
value: 69.02173913043478 |
|
- type: manhattan_precision |
|
value: 71.34831460674157 |
|
- type: manhattan_recall |
|
value: 66.84210526315789 |
|
- type: max_accuracy |
|
value: 89.2 |
|
- type: max_ap |
|
value: 79.1220443374692 |
|
- type: max_f1 |
|
value: 69.02173913043478 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/cdscr-sts |
|
name: MTEB CDSC-R |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 91.41534744278998 |
|
- type: cos_sim_spearman |
|
value: 92.12681551821147 |
|
- type: euclidean_pearson |
|
value: 91.74369794485992 |
|
- type: euclidean_spearman |
|
value: 92.12685848456046 |
|
- type: manhattan_pearson |
|
value: 91.66651938751657 |
|
- type: manhattan_spearman |
|
value: 92.057603126734 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: dbpedia-pl |
|
name: MTEB DBPedia-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 5.8709999999999996 |
|
- type: map_at_10 |
|
value: 12.486 |
|
- type: map_at_100 |
|
value: 16.897000000000002 |
|
- type: map_at_1000 |
|
value: 18.056 |
|
- type: map_at_3 |
|
value: 8.958 |
|
- type: map_at_5 |
|
value: 10.57 |
|
- type: mrr_at_1 |
|
value: 44.0 |
|
- type: mrr_at_10 |
|
value: 53.830999999999996 |
|
- type: mrr_at_100 |
|
value: 54.54 |
|
- type: mrr_at_1000 |
|
value: 54.568000000000005 |
|
- type: mrr_at_3 |
|
value: 51.87500000000001 |
|
- type: mrr_at_5 |
|
value: 53.113 |
|
- type: ndcg_at_1 |
|
value: 34.625 |
|
- type: ndcg_at_10 |
|
value: 26.996 |
|
- type: ndcg_at_100 |
|
value: 31.052999999999997 |
|
- type: ndcg_at_1000 |
|
value: 38.208 |
|
- type: ndcg_at_3 |
|
value: 29.471000000000004 |
|
- type: ndcg_at_5 |
|
value: 28.364 |
|
- type: precision_at_1 |
|
value: 44.0 |
|
- type: precision_at_10 |
|
value: 21.45 |
|
- type: precision_at_100 |
|
value: 6.837 |
|
- type: precision_at_1000 |
|
value: 1.6019999999999999 |
|
- type: precision_at_3 |
|
value: 32.333 |
|
- type: precision_at_5 |
|
value: 27.800000000000004 |
|
- type: recall_at_1 |
|
value: 5.8709999999999996 |
|
- type: recall_at_10 |
|
value: 17.318 |
|
- type: recall_at_100 |
|
value: 36.854 |
|
- type: recall_at_1000 |
|
value: 60.468999999999994 |
|
- type: recall_at_3 |
|
value: 10.213999999999999 |
|
- type: recall_at_5 |
|
value: 13.364 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: fiqa-pl |
|
name: MTEB FiQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 10.289 |
|
- type: map_at_10 |
|
value: 18.285999999999998 |
|
- type: map_at_100 |
|
value: 19.743 |
|
- type: map_at_1000 |
|
value: 19.964000000000002 |
|
- type: map_at_3 |
|
value: 15.193000000000001 |
|
- type: map_at_5 |
|
value: 16.962 |
|
- type: mrr_at_1 |
|
value: 21.914 |
|
- type: mrr_at_10 |
|
value: 30.653999999999996 |
|
- type: mrr_at_100 |
|
value: 31.623 |
|
- type: mrr_at_1000 |
|
value: 31.701 |
|
- type: mrr_at_3 |
|
value: 27.855 |
|
- type: mrr_at_5 |
|
value: 29.514000000000003 |
|
- type: ndcg_at_1 |
|
value: 21.914 |
|
- type: ndcg_at_10 |
|
value: 24.733 |
|
- type: ndcg_at_100 |
|
value: 31.253999999999998 |
|
- type: ndcg_at_1000 |
|
value: 35.617 |
|
- type: ndcg_at_3 |
|
value: 20.962 |
|
- type: ndcg_at_5 |
|
value: 22.553 |
|
- type: precision_at_1 |
|
value: 21.914 |
|
- type: precision_at_10 |
|
value: 7.346 |
|
- type: precision_at_100 |
|
value: 1.389 |
|
- type: precision_at_1000 |
|
value: 0.214 |
|
- type: precision_at_3 |
|
value: 14.352 |
|
- type: precision_at_5 |
|
value: 11.42 |
|
- type: recall_at_1 |
|
value: 10.289 |
|
- type: recall_at_10 |
|
value: 31.459 |
|
- type: recall_at_100 |
|
value: 56.854000000000006 |
|
- type: recall_at_1000 |
|
value: 83.722 |
|
- type: recall_at_3 |
|
value: 19.457 |
|
- type: recall_at_5 |
|
value: 24.767 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: hotpotqa-pl |
|
name: MTEB HotpotQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 29.669 |
|
- type: map_at_10 |
|
value: 41.615 |
|
- type: map_at_100 |
|
value: 42.571999999999996 |
|
- type: map_at_1000 |
|
value: 42.662 |
|
- type: map_at_3 |
|
value: 38.938 |
|
- type: map_at_5 |
|
value: 40.541 |
|
- type: mrr_at_1 |
|
value: 59.338 |
|
- type: mrr_at_10 |
|
value: 66.93900000000001 |
|
- type: mrr_at_100 |
|
value: 67.361 |
|
- type: mrr_at_1000 |
|
value: 67.38499999999999 |
|
- type: mrr_at_3 |
|
value: 65.384 |
|
- type: mrr_at_5 |
|
value: 66.345 |
|
- type: ndcg_at_1 |
|
value: 59.338 |
|
- type: ndcg_at_10 |
|
value: 50.607 |
|
- type: ndcg_at_100 |
|
value: 54.342999999999996 |
|
- type: ndcg_at_1000 |
|
value: 56.286 |
|
- type: ndcg_at_3 |
|
value: 46.289 |
|
- type: ndcg_at_5 |
|
value: 48.581 |
|
- type: precision_at_1 |
|
value: 59.338 |
|
- type: precision_at_10 |
|
value: 10.585 |
|
- type: precision_at_100 |
|
value: 1.353 |
|
- type: precision_at_1000 |
|
value: 0.161 |
|
- type: precision_at_3 |
|
value: 28.877000000000002 |
|
- type: precision_at_5 |
|
value: 19.133 |
|
- type: recall_at_1 |
|
value: 29.669 |
|
- type: recall_at_10 |
|
value: 52.92400000000001 |
|
- type: recall_at_100 |
|
value: 67.657 |
|
- type: recall_at_1000 |
|
value: 80.628 |
|
- type: recall_at_3 |
|
value: 43.315 |
|
- type: recall_at_5 |
|
value: 47.833 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: msmarco-pl |
|
name: MTEB MSMARCO-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 0.997 |
|
- type: map_at_10 |
|
value: 7.481999999999999 |
|
- type: map_at_100 |
|
value: 20.208000000000002 |
|
- type: map_at_1000 |
|
value: 25.601000000000003 |
|
- type: map_at_3 |
|
value: 3.055 |
|
- type: map_at_5 |
|
value: 4.853 |
|
- type: mrr_at_1 |
|
value: 55.814 |
|
- type: mrr_at_10 |
|
value: 64.651 |
|
- type: mrr_at_100 |
|
value: 65.003 |
|
- type: mrr_at_1000 |
|
value: 65.05199999999999 |
|
- type: mrr_at_3 |
|
value: 62.403 |
|
- type: mrr_at_5 |
|
value: 64.031 |
|
- type: ndcg_at_1 |
|
value: 44.186 |
|
- type: ndcg_at_10 |
|
value: 43.25 |
|
- type: ndcg_at_100 |
|
value: 40.515 |
|
- type: ndcg_at_1000 |
|
value: 48.345 |
|
- type: ndcg_at_3 |
|
value: 45.829 |
|
- type: ndcg_at_5 |
|
value: 46.477000000000004 |
|
- type: precision_at_1 |
|
value: 55.814 |
|
- type: precision_at_10 |
|
value: 50.465 |
|
- type: precision_at_100 |
|
value: 25.419000000000004 |
|
- type: precision_at_1000 |
|
value: 5.0840000000000005 |
|
- type: precision_at_3 |
|
value: 58.14 |
|
- type: precision_at_5 |
|
value: 57.67400000000001 |
|
- type: recall_at_1 |
|
value: 0.997 |
|
- type: recall_at_10 |
|
value: 8.985999999999999 |
|
- type: recall_at_100 |
|
value: 33.221000000000004 |
|
- type: recall_at_1000 |
|
value: 58.836999999999996 |
|
- type: recall_at_3 |
|
value: 3.472 |
|
- type: recall_at_5 |
|
value: 5.545 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_intent |
|
name: MTEB MassiveIntentClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 |
|
metrics: |
|
- type: accuracy |
|
value: 68.19771351714861 |
|
- type: f1 |
|
value: 64.75039989217822 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_scenario |
|
name: MTEB MassiveScenarioClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 7d571f92784cd94a019292a1f45445077d0ef634 |
|
metrics: |
|
- type: accuracy |
|
value: 73.9677202420982 |
|
- type: f1 |
|
value: 73.72287107577753 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nfcorpus-pl |
|
name: MTEB NFCorpus-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 5.167 |
|
- type: map_at_10 |
|
value: 10.791 |
|
- type: map_at_100 |
|
value: 14.072999999999999 |
|
- type: map_at_1000 |
|
value: 15.568000000000001 |
|
- type: map_at_3 |
|
value: 7.847999999999999 |
|
- type: map_at_5 |
|
value: 9.112 |
|
- type: mrr_at_1 |
|
value: 42.105 |
|
- type: mrr_at_10 |
|
value: 49.933 |
|
- type: mrr_at_100 |
|
value: 50.659 |
|
- type: mrr_at_1000 |
|
value: 50.705 |
|
- type: mrr_at_3 |
|
value: 47.988 |
|
- type: mrr_at_5 |
|
value: 49.056 |
|
- type: ndcg_at_1 |
|
value: 39.938 |
|
- type: ndcg_at_10 |
|
value: 31.147000000000002 |
|
- type: ndcg_at_100 |
|
value: 29.336000000000002 |
|
- type: ndcg_at_1000 |
|
value: 38.147 |
|
- type: ndcg_at_3 |
|
value: 35.607 |
|
- type: ndcg_at_5 |
|
value: 33.725 |
|
- type: precision_at_1 |
|
value: 41.486000000000004 |
|
- type: precision_at_10 |
|
value: 23.901 |
|
- type: precision_at_100 |
|
value: 7.960000000000001 |
|
- type: precision_at_1000 |
|
value: 2.086 |
|
- type: precision_at_3 |
|
value: 33.437 |
|
- type: precision_at_5 |
|
value: 29.598000000000003 |
|
- type: recall_at_1 |
|
value: 5.167 |
|
- type: recall_at_10 |
|
value: 14.244000000000002 |
|
- type: recall_at_100 |
|
value: 31.192999999999998 |
|
- type: recall_at_1000 |
|
value: 62.41799999999999 |
|
- type: recall_at_3 |
|
value: 8.697000000000001 |
|
- type: recall_at_5 |
|
value: 10.911 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nq-pl |
|
name: MTEB NQ-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 14.417 |
|
- type: map_at_10 |
|
value: 23.330000000000002 |
|
- type: map_at_100 |
|
value: 24.521 |
|
- type: map_at_1000 |
|
value: 24.604 |
|
- type: map_at_3 |
|
value: 20.076 |
|
- type: map_at_5 |
|
value: 21.854000000000003 |
|
- type: mrr_at_1 |
|
value: 16.454 |
|
- type: mrr_at_10 |
|
value: 25.402 |
|
- type: mrr_at_100 |
|
value: 26.411 |
|
- type: mrr_at_1000 |
|
value: 26.479000000000003 |
|
- type: mrr_at_3 |
|
value: 22.369 |
|
- type: mrr_at_5 |
|
value: 24.047 |
|
- type: ndcg_at_1 |
|
value: 16.454 |
|
- type: ndcg_at_10 |
|
value: 28.886 |
|
- type: ndcg_at_100 |
|
value: 34.489999999999995 |
|
- type: ndcg_at_1000 |
|
value: 36.687999999999995 |
|
- type: ndcg_at_3 |
|
value: 22.421 |
|
- type: ndcg_at_5 |
|
value: 25.505 |
|
- type: precision_at_1 |
|
value: 16.454 |
|
- type: precision_at_10 |
|
value: 5.252 |
|
- type: precision_at_100 |
|
value: 0.8410000000000001 |
|
- type: precision_at_1000 |
|
value: 0.105 |
|
- type: precision_at_3 |
|
value: 10.428999999999998 |
|
- type: precision_at_5 |
|
value: 8.019 |
|
- type: recall_at_1 |
|
value: 14.417 |
|
- type: recall_at_10 |
|
value: 44.025 |
|
- type: recall_at_100 |
|
value: 69.404 |
|
- type: recall_at_1000 |
|
value: 86.18900000000001 |
|
- type: recall_at_3 |
|
value: 26.972 |
|
- type: recall_at_5 |
|
value: 34.132 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: laugustyniak/abusive-clauses-pl |
|
name: MTEB PAC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 66.55082536924412 |
|
- type: ap |
|
value: 76.44962281293184 |
|
- type: f1 |
|
value: 63.899803692180434 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/ppc-pairclassification |
|
name: MTEB PPC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 86.5 |
|
- type: cos_sim_ap |
|
value: 92.65086645409387 |
|
- type: cos_sim_f1 |
|
value: 89.39157566302653 |
|
- type: cos_sim_precision |
|
value: 84.51327433628319 |
|
- type: cos_sim_recall |
|
value: 94.86754966887418 |
|
- type: dot_accuracy |
|
value: 86.5 |
|
- type: dot_ap |
|
value: 92.65086645409387 |
|
- type: dot_f1 |
|
value: 89.39157566302653 |
|
- type: dot_precision |
|
value: 84.51327433628319 |
|
- type: dot_recall |
|
value: 94.86754966887418 |
|
- type: euclidean_accuracy |
|
value: 86.5 |
|
- type: euclidean_ap |
|
value: 92.65086645409387 |
|
- type: euclidean_f1 |
|
value: 89.39157566302653 |
|
- type: euclidean_precision |
|
value: 84.51327433628319 |
|
- type: euclidean_recall |
|
value: 94.86754966887418 |
|
- type: manhattan_accuracy |
|
value: 86.5 |
|
- type: manhattan_ap |
|
value: 92.64975544736456 |
|
- type: manhattan_f1 |
|
value: 89.33852140077822 |
|
- type: manhattan_precision |
|
value: 84.28781204111601 |
|
- type: manhattan_recall |
|
value: 95.03311258278146 |
|
- type: max_accuracy |
|
value: 86.5 |
|
- type: max_ap |
|
value: 92.65086645409387 |
|
- type: max_f1 |
|
value: 89.39157566302653 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/psc-pairclassification |
|
name: MTEB PSC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 95.64007421150278 |
|
- type: cos_sim_ap |
|
value: 98.42114841894346 |
|
- type: cos_sim_f1 |
|
value: 92.8895612708018 |
|
- type: cos_sim_precision |
|
value: 92.1921921921922 |
|
- type: cos_sim_recall |
|
value: 93.59756097560977 |
|
- type: dot_accuracy |
|
value: 95.64007421150278 |
|
- type: dot_ap |
|
value: 98.42114841894346 |
|
- type: dot_f1 |
|
value: 92.8895612708018 |
|
- type: dot_precision |
|
value: 92.1921921921922 |
|
- type: dot_recall |
|
value: 93.59756097560977 |
|
- type: euclidean_accuracy |
|
value: 95.64007421150278 |
|
- type: euclidean_ap |
|
value: 98.42114841894346 |
|
- type: euclidean_f1 |
|
value: 92.8895612708018 |
|
- type: euclidean_precision |
|
value: 92.1921921921922 |
|
- type: euclidean_recall |
|
value: 93.59756097560977 |
|
- type: manhattan_accuracy |
|
value: 95.82560296846012 |
|
- type: manhattan_ap |
|
value: 98.38712415914046 |
|
- type: manhattan_f1 |
|
value: 93.19213313161876 |
|
- type: manhattan_precision |
|
value: 92.49249249249249 |
|
- type: manhattan_recall |
|
value: 93.90243902439023 |
|
- type: max_accuracy |
|
value: 95.82560296846012 |
|
- type: max_ap |
|
value: 98.42114841894346 |
|
- type: max_f1 |
|
value: 93.19213313161876 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_in |
|
name: MTEB PolEmo2.0-IN |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 68.40720221606648 |
|
- type: f1 |
|
value: 67.09084289613526 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_out |
|
name: MTEB PolEmo2.0-OUT |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 38.056680161943326 |
|
- type: f1 |
|
value: 32.87731504372395 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: quora-pl |
|
name: MTEB Quora-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 65.422 |
|
- type: map_at_10 |
|
value: 79.259 |
|
- type: map_at_100 |
|
value: 80.0 |
|
- type: map_at_1000 |
|
value: 80.021 |
|
- type: map_at_3 |
|
value: 76.16199999999999 |
|
- type: map_at_5 |
|
value: 78.03999999999999 |
|
- type: mrr_at_1 |
|
value: 75.26 |
|
- type: mrr_at_10 |
|
value: 82.39699999999999 |
|
- type: mrr_at_100 |
|
value: 82.589 |
|
- type: mrr_at_1000 |
|
value: 82.593 |
|
- type: mrr_at_3 |
|
value: 81.08999999999999 |
|
- type: mrr_at_5 |
|
value: 81.952 |
|
- type: ndcg_at_1 |
|
value: 75.3 |
|
- type: ndcg_at_10 |
|
value: 83.588 |
|
- type: ndcg_at_100 |
|
value: 85.312 |
|
- type: ndcg_at_1000 |
|
value: 85.536 |
|
- type: ndcg_at_3 |
|
value: 80.128 |
|
- type: ndcg_at_5 |
|
value: 81.962 |
|
- type: precision_at_1 |
|
value: 75.3 |
|
- type: precision_at_10 |
|
value: 12.856000000000002 |
|
- type: precision_at_100 |
|
value: 1.508 |
|
- type: precision_at_1000 |
|
value: 0.156 |
|
- type: precision_at_3 |
|
value: 35.207 |
|
- type: precision_at_5 |
|
value: 23.316 |
|
- type: recall_at_1 |
|
value: 65.422 |
|
- type: recall_at_10 |
|
value: 92.381 |
|
- type: recall_at_100 |
|
value: 98.575 |
|
- type: recall_at_1000 |
|
value: 99.85300000000001 |
|
- type: recall_at_3 |
|
value: 82.59100000000001 |
|
- type: recall_at_5 |
|
value: 87.629 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scidocs-pl |
|
name: MTEB SCIDOCS-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 2.52 |
|
- type: map_at_10 |
|
value: 6.814000000000001 |
|
- type: map_at_100 |
|
value: 8.267 |
|
- type: map_at_1000 |
|
value: 8.565000000000001 |
|
- type: map_at_3 |
|
value: 4.736 |
|
- type: map_at_5 |
|
value: 5.653 |
|
- type: mrr_at_1 |
|
value: 12.5 |
|
- type: mrr_at_10 |
|
value: 20.794999999999998 |
|
- type: mrr_at_100 |
|
value: 22.014 |
|
- type: mrr_at_1000 |
|
value: 22.109 |
|
- type: mrr_at_3 |
|
value: 17.8 |
|
- type: mrr_at_5 |
|
value: 19.42 |
|
- type: ndcg_at_1 |
|
value: 12.5 |
|
- type: ndcg_at_10 |
|
value: 12.209 |
|
- type: ndcg_at_100 |
|
value: 18.812 |
|
- type: ndcg_at_1000 |
|
value: 24.766 |
|
- type: ndcg_at_3 |
|
value: 10.847 |
|
- type: ndcg_at_5 |
|
value: 9.632 |
|
- type: precision_at_1 |
|
value: 12.5 |
|
- type: precision_at_10 |
|
value: 6.660000000000001 |
|
- type: precision_at_100 |
|
value: 1.6340000000000001 |
|
- type: precision_at_1000 |
|
value: 0.307 |
|
- type: precision_at_3 |
|
value: 10.299999999999999 |
|
- type: precision_at_5 |
|
value: 8.66 |
|
- type: recall_at_1 |
|
value: 2.52 |
|
- type: recall_at_10 |
|
value: 13.495 |
|
- type: recall_at_100 |
|
value: 33.188 |
|
- type: recall_at_1000 |
|
value: 62.34499999999999 |
|
- type: recall_at_3 |
|
value: 6.245 |
|
- type: recall_at_5 |
|
value: 8.76 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/sicke-pl-pairclassification |
|
name: MTEB SICK-E-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 86.13942111699959 |
|
- type: cos_sim_ap |
|
value: 81.47480017120256 |
|
- type: cos_sim_f1 |
|
value: 74.79794268919912 |
|
- type: cos_sim_precision |
|
value: 77.2382397572079 |
|
- type: cos_sim_recall |
|
value: 72.50712250712252 |
|
- type: dot_accuracy |
|
value: 86.13942111699959 |
|
- type: dot_ap |
|
value: 81.47478531367476 |
|
- type: dot_f1 |
|
value: 74.79794268919912 |
|
- type: dot_precision |
|
value: 77.2382397572079 |
|
- type: dot_recall |
|
value: 72.50712250712252 |
|
- type: euclidean_accuracy |
|
value: 86.13942111699959 |
|
- type: euclidean_ap |
|
value: 81.47478531367476 |
|
- type: euclidean_f1 |
|
value: 74.79794268919912 |
|
- type: euclidean_precision |
|
value: 77.2382397572079 |
|
- type: euclidean_recall |
|
value: 72.50712250712252 |
|
- type: manhattan_accuracy |
|
value: 86.15980432123929 |
|
- type: manhattan_ap |
|
value: 81.40798042612397 |
|
- type: manhattan_f1 |
|
value: 74.86116253239543 |
|
- type: manhattan_precision |
|
value: 77.9491133384734 |
|
- type: manhattan_recall |
|
value: 72.00854700854701 |
|
- type: max_accuracy |
|
value: 86.15980432123929 |
|
- type: max_ap |
|
value: 81.47480017120256 |
|
- type: max_f1 |
|
value: 74.86116253239543 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/sickr-pl-sts |
|
name: MTEB SICK-R-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 84.27525342551935 |
|
- type: cos_sim_spearman |
|
value: 79.50631730805885 |
|
- type: euclidean_pearson |
|
value: 82.07169123942028 |
|
- type: euclidean_spearman |
|
value: 79.50631887406465 |
|
- type: manhattan_pearson |
|
value: 81.98288826317463 |
|
- type: manhattan_spearman |
|
value: 79.4244081650332 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: mteb/sts22-crosslingual-sts |
|
name: MTEB STS22 (pl) |
|
config: pl |
|
split: test |
|
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 35.59400236598834 |
|
- type: cos_sim_spearman |
|
value: 36.782560207852846 |
|
- type: euclidean_pearson |
|
value: 28.546177668542942 |
|
- type: euclidean_spearman |
|
value: 36.68394223635756 |
|
- type: manhattan_pearson |
|
value: 28.45606963909248 |
|
- type: manhattan_spearman |
|
value: 36.475975118547524 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scifact-pl |
|
name: MTEB SciFact-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 41.028 |
|
- type: map_at_10 |
|
value: 52.23799999999999 |
|
- type: map_at_100 |
|
value: 52.905 |
|
- type: map_at_1000 |
|
value: 52.945 |
|
- type: map_at_3 |
|
value: 49.102000000000004 |
|
- type: map_at_5 |
|
value: 50.992000000000004 |
|
- type: mrr_at_1 |
|
value: 43.333 |
|
- type: mrr_at_10 |
|
value: 53.551 |
|
- type: mrr_at_100 |
|
value: 54.138 |
|
- type: mrr_at_1000 |
|
value: 54.175 |
|
- type: mrr_at_3 |
|
value: 51.056000000000004 |
|
- type: mrr_at_5 |
|
value: 52.705999999999996 |
|
- type: ndcg_at_1 |
|
value: 43.333 |
|
- type: ndcg_at_10 |
|
value: 57.731 |
|
- type: ndcg_at_100 |
|
value: 61.18599999999999 |
|
- type: ndcg_at_1000 |
|
value: 62.261 |
|
- type: ndcg_at_3 |
|
value: 52.276999999999994 |
|
- type: ndcg_at_5 |
|
value: 55.245999999999995 |
|
- type: precision_at_1 |
|
value: 43.333 |
|
- type: precision_at_10 |
|
value: 8.267 |
|
- type: precision_at_100 |
|
value: 1.02 |
|
- type: precision_at_1000 |
|
value: 0.11100000000000002 |
|
- type: precision_at_3 |
|
value: 21.444 |
|
- type: precision_at_5 |
|
value: 14.533 |
|
- type: recall_at_1 |
|
value: 41.028 |
|
- type: recall_at_10 |
|
value: 73.111 |
|
- type: recall_at_100 |
|
value: 89.533 |
|
- type: recall_at_1000 |
|
value: 98.0 |
|
- type: recall_at_3 |
|
value: 58.744 |
|
- type: recall_at_5 |
|
value: 66.106 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: trec-covid-pl |
|
name: MTEB TRECCOVID-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 0.146 |
|
- type: map_at_10 |
|
value: 1.09 |
|
- type: map_at_100 |
|
value: 6.002 |
|
- type: map_at_1000 |
|
value: 15.479999999999999 |
|
- type: map_at_3 |
|
value: 0.41000000000000003 |
|
- type: map_at_5 |
|
value: 0.596 |
|
- type: mrr_at_1 |
|
value: 54.0 |
|
- type: mrr_at_10 |
|
value: 72.367 |
|
- type: mrr_at_100 |
|
value: 72.367 |
|
- type: mrr_at_1000 |
|
value: 72.367 |
|
- type: mrr_at_3 |
|
value: 70.333 |
|
- type: mrr_at_5 |
|
value: 72.033 |
|
- type: ndcg_at_1 |
|
value: 48.0 |
|
- type: ndcg_at_10 |
|
value: 48.827 |
|
- type: ndcg_at_100 |
|
value: 38.513999999999996 |
|
- type: ndcg_at_1000 |
|
value: 37.958 |
|
- type: ndcg_at_3 |
|
value: 52.614000000000004 |
|
- type: ndcg_at_5 |
|
value: 51.013 |
|
- type: precision_at_1 |
|
value: 54.0 |
|
- type: precision_at_10 |
|
value: 53.6 |
|
- type: precision_at_100 |
|
value: 40.300000000000004 |
|
- type: precision_at_1000 |
|
value: 17.276 |
|
- type: precision_at_3 |
|
value: 57.333 |
|
- type: precision_at_5 |
|
value: 55.60000000000001 |
|
- type: recall_at_1 |
|
value: 0.146 |
|
- type: recall_at_10 |
|
value: 1.438 |
|
- type: recall_at_100 |
|
value: 9.673 |
|
- type: recall_at_1000 |
|
value: 36.870999999999995 |
|
- type: recall_at_3 |
|
value: 0.47400000000000003 |
|
- type: recall_at_5 |
|
value: 0.721 |
|
--- |
|
# Model Card for st-polish-kartonberta-base-alpha-v1 |
|
|
|
This sentence transformer model is designed to convert text content into a 768-float vector space, ensuring an effective representation. It aims to be proficient in tasks involving sentence / document similarity. |
|
|
|
The model has been released in its alpha version. Numerous potential enhancements could boost its performance, such as adjusting training hyperparameters or extending the training duration (currently limited to only one epoch). The main reason is limited GPU. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Developed by:** Bartłomiej Orlik, https://www.linkedin.com/in/bartłomiej-orlik/ |
|
- **Model type:** RoBERTa Sentence Transformer |
|
- **Language:** Polish |
|
- **License:** LGPL-3.0 |
|
- **Trained from model:** sdadas/polish-roberta-base-v2: https://huggingface.co/sdadas/polish-roberta-base-v2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
### Using Sentence-Transformers |
|
|
|
You can use the model with [sentence-transformers](https://www.SBERT.net): |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
model = SentenceTransformer('OrlikB/st-polish-kartonberta-base-alpha-v1') |
|
|
|
text_1 = 'Jestem wielkim fanem opakowań tekturowych' |
|
text_2 = 'Bardzo podobają mi się kartony' |
|
|
|
embeddings_1 = model.encode(text_1, normalize_embeddings=True) |
|
embeddings_2 = model.encode(text_2, normalize_embeddings=True) |
|
|
|
similarity = embeddings_1 @ embeddings_2.T |
|
print(similarity) |
|
``` |
|
|
|
### Using HuggingFace Transformers |
|
|
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel |
|
import torch |
|
import numpy as np |
|
|
|
def encode_text(text): |
|
encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors='pt', max_length=512) |
|
with torch.no_grad(): |
|
model_output = model(**encoded_input) |
|
sentence_embeddings = model_output[0][:, 0] |
|
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1) |
|
return sentence_embeddings.squeeze().numpy() |
|
|
|
cosine_similarity = lambda a, b: np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained('OrlikB/st-polish-kartonberta-base-alpha-v1') |
|
model = AutoModel.from_pretrained('OrlikB/st-polish-kartonberta-base-alpha-v1') |
|
model.eval() |
|
|
|
text_1 = 'Jestem wielkim fanem opakowań tekturowych' |
|
text_2 = 'Bardzo podobają mi się kartony' |
|
|
|
embeddings_1 = encode_text(text_1) |
|
embeddings_2 = encode_text(text_2) |
|
|
|
print(cosine_similarity(embeddings_1, embeddings_2)) |
|
``` |
|
*Note: You can use the encode_text function for demonstration purposes. For the best experience, it's recommended to process text in batches. |
|
|
|
|
|
|
|
|
|
## Evaluation |
|
#### [MTEB for Polish Language](https://huggingface.co/spaces/mteb/leaderboard) |
|
|
|
| Rank | Model | Model Size (GB) | Embedding Dimensions | Sequence Length | Average (26 datasets) | Classification Average (7 datasets) | Clustering Average (1 datasets) | Pair Classification Average (4 datasets) | Retrieval Average (11 datasets) | STS Average (3 datasets) | |
|
|-------:|:----------------------------------------|------------------:|-----------------------:|------------------:|------------------------:|--------------------------------------:|--------------------------------:|-----------------------------------------:|----------------------------------:|-------------------------:| |
|
| 1 | multilingual-e5-large | 2.24 | 1024 | 514 | 58.25 | 60.51 | 24.06 | 84.58 | 47.82 | 67.52 | |
|
| 2 | **st-polish-kartonberta-base-alpha-v1** | 0.5 | 768 | 514 | 56.92 | 60.44 | **32.85** | **87.92** | 42.19 | **69.47** | |
|
| 3 | multilingual-e5-base | 1.11 | 768 | 514 | 54.18 | 57.01 | 18.62 | 82.08 | 42.5 | 65.07 | |
|
| 4 | multilingual-e5-small | 0.47 | 384 | 512 | 53.15 | 54.35 | 19.64 | 81.67 | 41.52 | 66.08 | |
|
| 5 | st-polish-paraphrase-from-mpnet | 0.5 | 768 | 514 | 53.06 | 57.49 | 25.09 | 87.04 | 36.53 | 67.39 | |
|
| 6 | st-polish-paraphrase-from-distilroberta | 0.5 | 768 | 514 | 52.65 | 58.55 | 31.11 | 87 | 33.96 | 68.78 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## More Information |
|
|
|
I developed this model as a personal scientific initiative. |
|
|
|
I plan to start the development on a new ST model. However, due to limited computational resources, I suspended further work to create a larger or enhanced version of current model. |
|
|
|
|
|
|
|
|
|
|