Luke Merrick commited on
Commit
2ea2e86
·
1 Parent(s): 97eab2e

Add other MTEB eval numbers

Browse files
Files changed (1) hide show
  1. README.md +1053 -0
README.md CHANGED
@@ -7591,6 +7591,1059 @@ model-index:
7591
  value: 11.994
7592
  task:
7593
  type: Retrieval
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7594
  ---
7595
 
7596
 
 
7591
  value: 11.994
7592
  task:
7593
  type: Retrieval
7594
+ - dataset:
7595
+ config: en
7596
+ name: MTEB AmazonCounterfactualClassification (en)
7597
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
7598
+ split: test
7599
+ type: mteb/amazon_counterfactual
7600
+ metrics:
7601
+ - type: accuracy
7602
+ value: 68.29850746268657
7603
+ - type: ap
7604
+ value: 30.109785890841966
7605
+ - type: ap_weighted
7606
+ value: 30.109785890841966
7607
+ - type: f1
7608
+ value: 61.76875915202924
7609
+ - type: f1_weighted
7610
+ value: 71.32073190458556
7611
+ - type: main_score
7612
+ value: 68.29850746268657
7613
+ task:
7614
+ type: Classification
7615
+ - dataset:
7616
+ config: default
7617
+ name: MTEB AmazonPolarityClassification (default)
7618
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
7619
+ split: test
7620
+ type: mteb/amazon_polarity
7621
+ metrics:
7622
+ - type: accuracy
7623
+ value: 90.3068
7624
+ - type: ap
7625
+ value: 86.17914339624038
7626
+ - type: ap_weighted
7627
+ value: 86.17914339624038
7628
+ - type: f1
7629
+ value: 90.29716826358077
7630
+ - type: f1_weighted
7631
+ value: 90.29716826358077
7632
+ - type: main_score
7633
+ value: 90.3068
7634
+ task:
7635
+ type: Classification
7636
+ - dataset:
7637
+ config: en
7638
+ name: MTEB AmazonReviewsClassification (en)
7639
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
7640
+ split: test
7641
+ type: mteb/amazon_reviews_multi
7642
+ metrics:
7643
+ - type: accuracy
7644
+ value: 46.272000000000006
7645
+ - type: f1
7646
+ value: 45.57042543386915
7647
+ - type: f1_weighted
7648
+ value: 45.57042543386915
7649
+ - type: main_score
7650
+ value: 46.272000000000006
7651
+ task:
7652
+ type: Classification
7653
+ - dataset:
7654
+ config: default
7655
+ name: MTEB ArxivClusteringP2P (default)
7656
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
7657
+ split: test
7658
+ type: mteb/arxiv-clustering-p2p
7659
+ metrics:
7660
+ - type: main_score
7661
+ value: 44.9469238081379
7662
+ - type: v_measure
7663
+ value: 44.9469238081379
7664
+ - type: v_measure_std
7665
+ value: 13.26811262671461
7666
+ task:
7667
+ type: Clustering
7668
+ - dataset:
7669
+ config: default
7670
+ name: MTEB ArxivClusteringS2S (default)
7671
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
7672
+ split: test
7673
+ type: mteb/arxiv-clustering-s2s
7674
+ metrics:
7675
+ - type: main_score
7676
+ value: 34.12071448053325
7677
+ - type: v_measure
7678
+ value: 34.12071448053325
7679
+ - type: v_measure_std
7680
+ value: 13.7019879046405
7681
+ task:
7682
+ type: Clustering
7683
+ - dataset:
7684
+ config: default
7685
+ name: MTEB AskUbuntuDupQuestions (default)
7686
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
7687
+ split: test
7688
+ type: mteb/askubuntudupquestions-reranking
7689
+ metrics:
7690
+ - type: main_score
7691
+ value: 61.597667288657846
7692
+ - type: map
7693
+ value: 61.597667288657846
7694
+ - type: mrr
7695
+ value: 75.57940904893813
7696
+ - type: nAUC_map_diff1
7697
+ value: 8.745172077340095
7698
+ - type: nAUC_map_max
7699
+ value: 20.114863024035493
7700
+ - type: nAUC_map_std
7701
+ value: 15.991351189572192
7702
+ - type: nAUC_mrr_diff1
7703
+ value: 20.781369244159983
7704
+ - type: nAUC_mrr_max
7705
+ value: 30.78542570228559
7706
+ - type: nAUC_mrr_std
7707
+ value: 19.861484857303676
7708
+ task:
7709
+ type: Reranking
7710
+ - dataset:
7711
+ config: default
7712
+ name: MTEB BIOSSES (default)
7713
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
7714
+ split: test
7715
+ type: mteb/biosses-sts
7716
+ metrics:
7717
+ - type: cosine_pearson
7718
+ value: 88.55587996301419
7719
+ - type: cosine_spearman
7720
+ value: 86.40317357420093
7721
+ - type: euclidean_pearson
7722
+ value: 86.93771958250231
7723
+ - type: euclidean_spearman
7724
+ value: 86.40317357420093
7725
+ - type: main_score
7726
+ value: 86.40317357420093
7727
+ - type: manhattan_pearson
7728
+ value: 86.92196577117366
7729
+ - type: manhattan_spearman
7730
+ value: 85.79834051556095
7731
+ - type: pearson
7732
+ value: 88.55587996301419
7733
+ - type: spearman
7734
+ value: 86.40317357420093
7735
+ task:
7736
+ type: STS
7737
+ - dataset:
7738
+ config: default
7739
+ name: MTEB Banking77Classification (default)
7740
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
7741
+ split: test
7742
+ type: mteb/banking77
7743
+ metrics:
7744
+ - type: accuracy
7745
+ value: 80.0064935064935
7746
+ - type: f1
7747
+ value: 79.29524254086299
7748
+ - type: f1_weighted
7749
+ value: 79.295242540863
7750
+ - type: main_score
7751
+ value: 80.0064935064935
7752
+ task:
7753
+ type: Classification
7754
+ - dataset:
7755
+ config: default
7756
+ name: MTEB BiorxivClusteringP2P (default)
7757
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
7758
+ split: test
7759
+ type: mteb/biorxiv-clustering-p2p
7760
+ metrics:
7761
+ - type: main_score
7762
+ value: 35.27186813341181
7763
+ - type: v_measure
7764
+ value: 35.27186813341181
7765
+ - type: v_measure_std
7766
+ value: 0.8621482145872432
7767
+ task:
7768
+ type: Clustering
7769
+ - dataset:
7770
+ config: default
7771
+ name: MTEB BiorxivClusteringS2S (default)
7772
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
7773
+ split: test
7774
+ type: mteb/biorxiv-clustering-s2s
7775
+ metrics:
7776
+ - type: main_score
7777
+ value: 28.411805064852295
7778
+ - type: v_measure
7779
+ value: 28.411805064852295
7780
+ - type: v_measure_std
7781
+ value: 0.7194290078011281
7782
+ task:
7783
+ type: Clustering
7784
+ - dataset:
7785
+ config: default
7786
+ name: MTEB EmotionClassification (default)
7787
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
7788
+ split: test
7789
+ type: mteb/emotion
7790
+ metrics:
7791
+ - type: accuracy
7792
+ value: 43.675
7793
+ - type: f1
7794
+ value: 40.15061931375577
7795
+ - type: f1_weighted
7796
+ value: 45.714186572727066
7797
+ - type: main_score
7798
+ value: 43.675
7799
+ task:
7800
+ type: Classification
7801
+ - dataset:
7802
+ config: default
7803
+ name: MTEB ImdbClassification (default)
7804
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
7805
+ split: test
7806
+ type: mteb/imdb
7807
+ metrics:
7808
+ - type: accuracy
7809
+ value: 84.35640000000001
7810
+ - type: ap
7811
+ value: 79.07507736685174
7812
+ - type: ap_weighted
7813
+ value: 79.07507736685174
7814
+ - type: f1
7815
+ value: 84.32288494833531
7816
+ - type: f1_weighted
7817
+ value: 84.32288494833531
7818
+ - type: main_score
7819
+ value: 84.35640000000001
7820
+ task:
7821
+ type: Classification
7822
+ - dataset:
7823
+ config: en
7824
+ name: MTEB MTOPDomainClassification (en)
7825
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
7826
+ split: test
7827
+ type: mteb/mtop_domain
7828
+ metrics:
7829
+ - type: accuracy
7830
+ value: 91.35658914728684
7831
+ - type: f1
7832
+ value: 90.86877537911086
7833
+ - type: f1_weighted
7834
+ value: 91.3282092774443
7835
+ - type: main_score
7836
+ value: 91.35658914728684
7837
+ task:
7838
+ type: Classification
7839
+ - dataset:
7840
+ config: en
7841
+ name: MTEB MTOPIntentClassification (en)
7842
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
7843
+ split: test
7844
+ type: mteb/mtop_intent
7845
+ metrics:
7846
+ - type: accuracy
7847
+ value: 60.63611491108071
7848
+ - type: f1
7849
+ value: 42.78886482112741
7850
+ - type: f1_weighted
7851
+ value: 63.44208631840539
7852
+ - type: main_score
7853
+ value: 60.63611491108071
7854
+ task:
7855
+ type: Classification
7856
+ - dataset:
7857
+ config: en
7858
+ name: MTEB MassiveIntentClassification (en)
7859
+ revision: 4672e20407010da34463acc759c162ca9734bca6
7860
+ split: test
7861
+ type: mteb/amazon_massive_intent
7862
+ metrics:
7863
+ - type: accuracy
7864
+ value: 66.68796234028245
7865
+ - type: f1
7866
+ value: 64.44940791000278
7867
+ - type: f1_weighted
7868
+ value: 65.77554417406792
7869
+ - type: main_score
7870
+ value: 66.68796234028245
7871
+ task:
7872
+ type: Classification
7873
+ - dataset:
7874
+ config: en
7875
+ name: MTEB MassiveScenarioClassification (en)
7876
+ revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8
7877
+ split: test
7878
+ type: mteb/amazon_massive_scenario
7879
+ metrics:
7880
+ - type: accuracy
7881
+ value: 73.0598520511096
7882
+ - type: f1
7883
+ value: 72.14267273884774
7884
+ - type: f1_weighted
7885
+ value: 72.93345180137516
7886
+ - type: main_score
7887
+ value: 73.0598520511096
7888
+ task:
7889
+ type: Classification
7890
+ - dataset:
7891
+ config: default
7892
+ name: MTEB MedrxivClusteringP2P (default)
7893
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
7894
+ split: test
7895
+ type: mteb/medrxiv-clustering-p2p
7896
+ metrics:
7897
+ - type: main_score
7898
+ value: 31.143081341699606
7899
+ - type: v_measure
7900
+ value: 31.143081341699606
7901
+ - type: v_measure_std
7902
+ value: 1.5578716347076906
7903
+ task:
7904
+ type: Clustering
7905
+ - dataset:
7906
+ config: default
7907
+ name: MTEB MedrxivClusteringS2S (default)
7908
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
7909
+ split: test
7910
+ type: mteb/medrxiv-clustering-s2s
7911
+ metrics:
7912
+ - type: main_score
7913
+ value: 27.010818869829556
7914
+ - type: v_measure
7915
+ value: 27.010818869829556
7916
+ - type: v_measure_std
7917
+ value: 1.1771554540819378
7918
+ task:
7919
+ type: Clustering
7920
+ - dataset:
7921
+ config: default
7922
+ name: MTEB MindSmallReranking (default)
7923
+ revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7
7924
+ split: test
7925
+ type: mteb/mind_small
7926
+ metrics:
7927
+ - type: main_score
7928
+ value: 30.20503776754942
7929
+ - type: map
7930
+ value: 30.20503776754942
7931
+ - type: mrr
7932
+ value: 31.076636002733437
7933
+ - type: nAUC_map_diff1
7934
+ value: 7.290568655287842
7935
+ - type: nAUC_map_max
7936
+ value: -21.381599355932945
7937
+ - type: nAUC_map_std
7938
+ value: -7.709920607543168
7939
+ - type: nAUC_mrr_diff1
7940
+ value: 7.558397329284913
7941
+ - type: nAUC_mrr_max
7942
+ value: -15.981397186427607
7943
+ - type: nAUC_mrr_std
7944
+ value: -4.870495243168834
7945
+ task:
7946
+ type: Reranking
7947
+ - dataset:
7948
+ config: default
7949
+ name: MTEB RedditClustering (default)
7950
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
7951
+ split: test
7952
+ type: mteb/reddit-clustering
7953
+ metrics:
7954
+ - type: main_score
7955
+ value: 51.85893476633338
7956
+ - type: v_measure
7957
+ value: 51.85893476633338
7958
+ - type: v_measure_std
7959
+ value: 4.704770139385852
7960
+ task:
7961
+ type: Clustering
7962
+ - dataset:
7963
+ config: default
7964
+ name: MTEB RedditClusteringP2P (default)
7965
+ revision: 385e3cb46b4cfa89021f56c4380204149d0efe33
7966
+ split: test
7967
+ type: mteb/reddit-clustering-p2p
7968
+ metrics:
7969
+ - type: main_score
7970
+ value: 61.8124222918822
7971
+ - type: v_measure
7972
+ value: 61.8124222918822
7973
+ - type: v_measure_std
7974
+ value: 11.994472578100165
7975
+ task:
7976
+ type: Clustering
7977
+ - dataset:
7978
+ config: default
7979
+ name: MTEB SICK-R (default)
7980
+ revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
7981
+ split: test
7982
+ type: mteb/sickr-sts
7983
+ metrics:
7984
+ - type: cosine_pearson
7985
+ value: 77.63310776935984
7986
+ - type: cosine_spearman
7987
+ value: 69.86468291111039
7988
+ - type: euclidean_pearson
7989
+ value: 73.91537077798837
7990
+ - type: euclidean_spearman
7991
+ value: 69.86468376650203
7992
+ - type: main_score
7993
+ value: 69.86468291111039
7994
+ - type: manhattan_pearson
7995
+ value: 73.68616048370464
7996
+ - type: manhattan_spearman
7997
+ value: 69.76232036206659
7998
+ - type: pearson
7999
+ value: 77.63310776935984
8000
+ - type: spearman
8001
+ value: 69.86468291111039
8002
+ task:
8003
+ type: STS
8004
+ - dataset:
8005
+ config: default
8006
+ name: MTEB STS12 (default)
8007
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
8008
+ split: test
8009
+ type: mteb/sts12-sts
8010
+ metrics:
8011
+ - type: cosine_pearson
8012
+ value: 57.71716838245049
8013
+ - type: cosine_spearman
8014
+ value: 61.797855543446424
8015
+ - type: euclidean_pearson
8016
+ value: 58.22958675325848
8017
+ - type: euclidean_spearman
8018
+ value: 61.797855543446424
8019
+ - type: main_score
8020
+ value: 61.797855543446424
8021
+ - type: manhattan_pearson
8022
+ value: 57.63117544997929
8023
+ - type: manhattan_spearman
8024
+ value: 61.3629404350085
8025
+ - type: pearson
8026
+ value: 57.71716838245049
8027
+ - type: spearman
8028
+ value: 61.797855543446424
8029
+ task:
8030
+ type: STS
8031
+ - dataset:
8032
+ config: default
8033
+ name: MTEB STS13 (default)
8034
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
8035
+ split: test
8036
+ type: mteb/sts13-sts
8037
+ metrics:
8038
+ - type: cosine_pearson
8039
+ value: 82.30260026790903
8040
+ - type: cosine_spearman
8041
+ value: 82.66959813070869
8042
+ - type: euclidean_pearson
8043
+ value: 82.08383017580783
8044
+ - type: euclidean_spearman
8045
+ value: 82.66959813070869
8046
+ - type: main_score
8047
+ value: 82.66959813070869
8048
+ - type: manhattan_pearson
8049
+ value: 81.77991451392153
8050
+ - type: manhattan_spearman
8051
+ value: 82.3652534745606
8052
+ - type: pearson
8053
+ value: 82.30260026790903
8054
+ - type: spearman
8055
+ value: 82.66959813070869
8056
+ task:
8057
+ type: STS
8058
+ - dataset:
8059
+ config: default
8060
+ name: MTEB STS14 (default)
8061
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
8062
+ split: test
8063
+ type: mteb/sts14-sts
8064
+ metrics:
8065
+ - type: cosine_pearson
8066
+ value: 71.50608384084478
8067
+ - type: cosine_spearman
8068
+ value: 68.94968064977785
8069
+ - type: euclidean_pearson
8070
+ value: 70.73381299949564
8071
+ - type: euclidean_spearman
8072
+ value: 68.94968064977785
8073
+ - type: main_score
8074
+ value: 68.94968064977785
8075
+ - type: manhattan_pearson
8076
+ value: 70.5385486953787
8077
+ - type: manhattan_spearman
8078
+ value: 68.82132770672365
8079
+ - type: pearson
8080
+ value: 71.50608384084478
8081
+ - type: spearman
8082
+ value: 68.94968064977785
8083
+ task:
8084
+ type: STS
8085
+ - dataset:
8086
+ config: default
8087
+ name: MTEB STS15 (default)
8088
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
8089
+ split: test
8090
+ type: mteb/sts15-sts
8091
+ metrics:
8092
+ - type: cosine_pearson
8093
+ value: 73.66969825874907
8094
+ - type: cosine_spearman
8095
+ value: 75.55374982088381
8096
+ - type: euclidean_pearson
8097
+ value: 75.9339313749594
8098
+ - type: euclidean_spearman
8099
+ value: 75.55374982088381
8100
+ - type: main_score
8101
+ value: 75.55374982088381
8102
+ - type: manhattan_pearson
8103
+ value: 75.88287553383817
8104
+ - type: manhattan_spearman
8105
+ value: 75.50729812977688
8106
+ - type: pearson
8107
+ value: 73.66969825874907
8108
+ - type: spearman
8109
+ value: 75.55374982088381
8110
+ task:
8111
+ type: STS
8112
+ - dataset:
8113
+ config: default
8114
+ name: MTEB STS16 (default)
8115
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
8116
+ split: test
8117
+ type: mteb/sts16-sts
8118
+ metrics:
8119
+ - type: cosine_pearson
8120
+ value: 74.5954724414016
8121
+ - type: cosine_spearman
8122
+ value: 77.2688820850505
8123
+ - type: euclidean_pearson
8124
+ value: 77.19866353971555
8125
+ - type: euclidean_spearman
8126
+ value: 77.2688820850505
8127
+ - type: main_score
8128
+ value: 77.2688820850505
8129
+ - type: manhattan_pearson
8130
+ value: 77.27072603680978
8131
+ - type: manhattan_spearman
8132
+ value: 77.29408453673607
8133
+ - type: pearson
8134
+ value: 74.5954724414016
8135
+ - type: spearman
8136
+ value: 77.2688820850505
8137
+ task:
8138
+ type: STS
8139
+ - dataset:
8140
+ config: en-en
8141
+ name: MTEB STS17 (en-en)
8142
+ revision: faeb762787bd10488a50c8b5be4a3b82e411949c
8143
+ split: test
8144
+ type: mteb/sts17-crosslingual-sts
8145
+ metrics:
8146
+ - type: cosine_pearson
8147
+ value: 71.52588722654055
8148
+ - type: cosine_spearman
8149
+ value: 74.97235736456061
8150
+ - type: euclidean_pearson
8151
+ value: 74.51952528854038
8152
+ - type: euclidean_spearman
8153
+ value: 74.97235736456061
8154
+ - type: main_score
8155
+ value: 74.97235736456061
8156
+ - type: manhattan_pearson
8157
+ value: 74.48272300884209
8158
+ - type: manhattan_spearman
8159
+ value: 74.80633649415176
8160
+ - type: pearson
8161
+ value: 71.52588722654055
8162
+ - type: spearman
8163
+ value: 74.97235736456061
8164
+ task:
8165
+ type: STS
8166
+ - dataset:
8167
+ config: en
8168
+ name: MTEB STS22 (en)
8169
+ revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
8170
+ split: test
8171
+ type: mteb/sts22-crosslingual-sts
8172
+ metrics:
8173
+ - type: cosine_pearson
8174
+ value: 68.80031120401976
8175
+ - type: cosine_spearman
8176
+ value: 69.07945196478491
8177
+ - type: euclidean_pearson
8178
+ value: 68.99674496430792
8179
+ - type: euclidean_spearman
8180
+ value: 69.07945196478491
8181
+ - type: main_score
8182
+ value: 69.07945196478491
8183
+ - type: manhattan_pearson
8184
+ value: 69.00236107775687
8185
+ - type: manhattan_spearman
8186
+ value: 68.98064879049272
8187
+ - type: pearson
8188
+ value: 68.80031120401976
8189
+ - type: spearman
8190
+ value: 69.07945196478491
8191
+ task:
8192
+ type: STS
8193
+ - dataset:
8194
+ config: default
8195
+ name: MTEB STSBenchmark (default)
8196
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
8197
+ split: test
8198
+ type: mteb/stsbenchmark-sts
8199
+ metrics:
8200
+ - type: cosine_pearson
8201
+ value: 65.6898007230089
8202
+ - type: cosine_spearman
8203
+ value: 69.72386211803668
8204
+ - type: euclidean_pearson
8205
+ value: 69.04523003701475
8206
+ - type: euclidean_spearman
8207
+ value: 69.72386211803668
8208
+ - type: main_score
8209
+ value: 69.72386211803668
8210
+ - type: manhattan_pearson
8211
+ value: 68.80479743770702
8212
+ - type: manhattan_spearman
8213
+ value: 69.43264575177459
8214
+ - type: pearson
8215
+ value: 65.6898007230089
8216
+ - type: spearman
8217
+ value: 69.72386211803668
8218
+ task:
8219
+ type: STS
8220
+ - dataset:
8221
+ config: default
8222
+ name: MTEB SciDocsRR (default)
8223
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
8224
+ split: test
8225
+ type: mteb/scidocs-reranking
8226
+ metrics:
8227
+ - type: main_score
8228
+ value: 79.74088066874383
8229
+ - type: map
8230
+ value: 79.74088066874383
8231
+ - type: mrr
8232
+ value: 94.47697455050397
8233
+ - type: nAUC_map_diff1
8234
+ value: 8.036086256905502
8235
+ - type: nAUC_map_max
8236
+ value: 54.88199803816819
8237
+ - type: nAUC_map_std
8238
+ value: 69.16267942176574
8239
+ - type: nAUC_mrr_diff1
8240
+ value: 50.020738477678115
8241
+ - type: nAUC_mrr_max
8242
+ value: 83.28922770326483
8243
+ - type: nAUC_mrr_std
8244
+ value: 83.63973501802224
8245
+ task:
8246
+ type: Reranking
8247
+ - dataset:
8248
+ config: default
8249
+ name: MTEB SprintDuplicateQuestions (default)
8250
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
8251
+ split: test
8252
+ type: mteb/sprintduplicatequestions-pairclassification
8253
+ metrics:
8254
+ - type: cosine_accuracy
8255
+ value: 99.83861386138614
8256
+ - type: cosine_accuracy_threshold
8257
+ value: 74.75666999816895
8258
+ - type: cosine_ap
8259
+ value: 96.15132792066652
8260
+ - type: cosine_f1
8261
+ value: 91.84890656063618
8262
+ - type: cosine_f1_threshold
8263
+ value: 71.70594930648804
8264
+ - type: cosine_precision
8265
+ value: 91.30434782608695
8266
+ - type: cosine_recall
8267
+ value: 92.4
8268
+ - type: dot_accuracy
8269
+ value: 99.83861386138614
8270
+ - type: dot_accuracy_threshold
8271
+ value: 74.75666999816895
8272
+ - type: dot_ap
8273
+ value: 96.15132792066653
8274
+ - type: dot_f1
8275
+ value: 91.84890656063618
8276
+ - type: dot_f1_threshold
8277
+ value: 71.70596122741699
8278
+ - type: dot_precision
8279
+ value: 91.30434782608695
8280
+ - type: dot_recall
8281
+ value: 92.4
8282
+ - type: euclidean_accuracy
8283
+ value: 99.83861386138614
8284
+ - type: euclidean_accuracy_threshold
8285
+ value: 71.05395793914795
8286
+ - type: euclidean_ap
8287
+ value: 96.15132792066652
8288
+ - type: euclidean_f1
8289
+ value: 91.84890656063618
8290
+ - type: euclidean_f1_threshold
8291
+ value: 75.22505521774292
8292
+ - type: euclidean_precision
8293
+ value: 91.30434782608695
8294
+ - type: euclidean_recall
8295
+ value: 92.4
8296
+ - type: main_score
8297
+ value: 96.15132792066653
8298
+ - type: manhattan_accuracy
8299
+ value: 99.83564356435643
8300
+ - type: manhattan_accuracy_threshold
8301
+ value: 1547.6950645446777
8302
+ - type: manhattan_ap
8303
+ value: 96.06151211452136
8304
+ - type: manhattan_f1
8305
+ value: 91.61676646706587
8306
+ - type: manhattan_f1_threshold
8307
+ value: 1626.3608932495117
8308
+ - type: manhattan_precision
8309
+ value: 91.43426294820716
8310
+ - type: manhattan_recall
8311
+ value: 91.8
8312
+ - type: max_ap
8313
+ value: 96.15132792066653
8314
+ - type: max_f1
8315
+ value: 91.84890656063618
8316
+ - type: max_precision
8317
+ value: 91.43426294820716
8318
+ - type: max_recall
8319
+ value: 92.4
8320
+ - type: similarity_accuracy
8321
+ value: 99.83861386138614
8322
+ - type: similarity_accuracy_threshold
8323
+ value: 74.75666999816895
8324
+ - type: similarity_ap
8325
+ value: 96.15132792066652
8326
+ - type: similarity_f1
8327
+ value: 91.84890656063618
8328
+ - type: similarity_f1_threshold
8329
+ value: 71.70594930648804
8330
+ - type: similarity_precision
8331
+ value: 91.30434782608695
8332
+ - type: similarity_recall
8333
+ value: 92.4
8334
+ task:
8335
+ type: PairClassification
8336
+ - dataset:
8337
+ config: default
8338
+ name: MTEB StackExchangeClustering (default)
8339
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
8340
+ split: test
8341
+ type: mteb/stackexchange-clustering
8342
+ metrics:
8343
+ - type: main_score
8344
+ value: 61.24120328328453
8345
+ - type: v_measure
8346
+ value: 61.24120328328453
8347
+ - type: v_measure_std
8348
+ value: 3.9946560691100372
8349
+ task:
8350
+ type: Clustering
8351
+ - dataset:
8352
+ config: default
8353
+ name: MTEB StackExchangeClusteringP2P (default)
8354
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
8355
+ split: test
8356
+ type: mteb/stackexchange-clustering-p2p
8357
+ metrics:
8358
+ - type: main_score
8359
+ value: 33.808268374864745
8360
+ - type: v_measure
8361
+ value: 33.808268374864745
8362
+ - type: v_measure_std
8363
+ value: 1.2212188701887239
8364
+ task:
8365
+ type: Clustering
8366
+ - dataset:
8367
+ config: default
8368
+ name: MTEB StackOverflowDupQuestions (default)
8369
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
8370
+ split: test
8371
+ type: mteb/stackoverflowdupquestions-reranking
8372
+ metrics:
8373
+ - type: main_score
8374
+ value: 52.19806018468037
8375
+ - type: map
8376
+ value: 52.19806018468037
8377
+ - type: mrr
8378
+ value: 52.98921462524404
8379
+ - type: nAUC_map_diff1
8380
+ value: 37.41443156995912
8381
+ - type: nAUC_map_max
8382
+ value: 9.410262727675603
8383
+ - type: nAUC_map_std
8384
+ value: 8.7094185014992
8385
+ - type: nAUC_mrr_diff1
8386
+ value: 37.78202772392581
8387
+ - type: nAUC_mrr_max
8388
+ value: 10.517635536565816
8389
+ - type: nAUC_mrr_std
8390
+ value: 8.509423813772491
8391
+ task:
8392
+ type: Reranking
8393
+ - dataset:
8394
+ config: default
8395
+ name: MTEB SummEval (default)
8396
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
8397
+ split: test
8398
+ type: mteb/summeval
8399
+ metrics:
8400
+ - type: cosine_pearson
8401
+ value: 30.48413700430812
8402
+ - type: cosine_spearman
8403
+ value: 30.357162200875816
8404
+ - type: dot_pearson
8405
+ value: 30.484140144824938
8406
+ - type: dot_spearman
8407
+ value: 30.357162200875816
8408
+ - type: main_score
8409
+ value: 30.357162200875816
8410
+ - type: pearson
8411
+ value: 30.48413700430812
8412
+ - type: spearman
8413
+ value: 30.357162200875816
8414
+ task:
8415
+ type: Summarization
8416
+ - dataset:
8417
+ config: default
8418
+ name: MTEB ToxicConversationsClassification (default)
8419
+ revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de
8420
+ split: test
8421
+ type: mteb/toxic_conversations_50k
8422
+ metrics:
8423
+ - type: accuracy
8424
+ value: 66.8359375
8425
+ - type: ap
8426
+ value: 12.482653786025985
8427
+ - type: ap_weighted
8428
+ value: 12.482653786025985
8429
+ - type: f1
8430
+ value: 51.328608527332385
8431
+ - type: f1_weighted
8432
+ value: 74.07974463955398
8433
+ - type: main_score
8434
+ value: 66.8359375
8435
+ task:
8436
+ type: Classification
8437
+ - dataset:
8438
+ config: default
8439
+ name: MTEB TweetSentimentExtractionClassification (default)
8440
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
8441
+ split: test
8442
+ type: mteb/tweet_sentiment_extraction
8443
+ metrics:
8444
+ - type: accuracy
8445
+ value: 53.907753254103
8446
+ - type: f1
8447
+ value: 54.22707647269581
8448
+ - type: f1_weighted
8449
+ value: 53.611822984407695
8450
+ - type: main_score
8451
+ value: 53.907753254103
8452
+ task:
8453
+ type: Classification
8454
+ - dataset:
8455
+ config: default
8456
+ name: MTEB TwentyNewsgroupsClustering (default)
8457
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
8458
+ split: test
8459
+ type: mteb/twentynewsgroups-clustering
8460
+ metrics:
8461
+ - type: main_score
8462
+ value: 38.1364789307295
8463
+ - type: v_measure
8464
+ value: 38.1364789307295
8465
+ - type: v_measure_std
8466
+ value: 2.0731634966352077
8467
+ task:
8468
+ type: Clustering
8469
+ - dataset:
8470
+ config: default
8471
+ name: MTEB TwitterSemEval2015 (default)
8472
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
8473
+ split: test
8474
+ type: mteb/twittersemeval2015-pairclassification
8475
+ metrics:
8476
+ - type: cosine_accuracy
8477
+ value: 82.66674614054956
8478
+ - type: cosine_accuracy_threshold
8479
+ value: 79.80123162269592
8480
+ - type: cosine_ap
8481
+ value: 63.28209719072804
8482
+ - type: cosine_f1
8483
+ value: 60.16389710903711
8484
+ - type: cosine_f1_threshold
8485
+ value: 72.22893834114075
8486
+ - type: cosine_precision
8487
+ value: 52.90232185748599
8488
+ - type: cosine_recall
8489
+ value: 69.73614775725594
8490
+ - type: dot_accuracy
8491
+ value: 82.66674614054956
8492
+ - type: dot_accuracy_threshold
8493
+ value: 79.8012375831604
8494
+ - type: dot_ap
8495
+ value: 63.282103870645166
8496
+ - type: dot_f1
8497
+ value: 60.16389710903711
8498
+ - type: dot_f1_threshold
8499
+ value: 72.22894430160522
8500
+ - type: dot_precision
8501
+ value: 52.90232185748599
8502
+ - type: dot_recall
8503
+ value: 69.73614775725594
8504
+ - type: euclidean_accuracy
8505
+ value: 82.66674614054956
8506
+ - type: euclidean_accuracy_threshold
8507
+ value: 63.55905532836914
8508
+ - type: euclidean_ap
8509
+ value: 63.282095399953164
8510
+ - type: euclidean_f1
8511
+ value: 60.16389710903711
8512
+ - type: euclidean_f1_threshold
8513
+ value: 74.5265781879425
8514
+ - type: euclidean_precision
8515
+ value: 52.90232185748599
8516
+ - type: euclidean_recall
8517
+ value: 69.73614775725594
8518
+ - type: main_score
8519
+ value: 63.282103870645166
8520
+ - type: manhattan_accuracy
8521
+ value: 82.74423317637242
8522
+ - type: manhattan_accuracy_threshold
8523
+ value: 1415.380859375
8524
+ - type: manhattan_ap
8525
+ value: 63.26931757839598
8526
+ - type: manhattan_f1
8527
+ value: 60.11014948859166
8528
+ - type: manhattan_f1_threshold
8529
+ value: 1632.522201538086
8530
+ - type: manhattan_precision
8531
+ value: 52.359506559624045
8532
+ - type: manhattan_recall
8533
+ value: 70.55408970976254
8534
+ - type: max_ap
8535
+ value: 63.282103870645166
8536
+ - type: max_f1
8537
+ value: 60.16389710903711
8538
+ - type: max_precision
8539
+ value: 52.90232185748599
8540
+ - type: max_recall
8541
+ value: 70.55408970976254
8542
+ - type: similarity_accuracy
8543
+ value: 82.66674614054956
8544
+ - type: similarity_accuracy_threshold
8545
+ value: 79.80123162269592
8546
+ - type: similarity_ap
8547
+ value: 63.28209719072804
8548
+ - type: similarity_f1
8549
+ value: 60.16389710903711
8550
+ - type: similarity_f1_threshold
8551
+ value: 72.22893834114075
8552
+ - type: similarity_precision
8553
+ value: 52.90232185748599
8554
+ - type: similarity_recall
8555
+ value: 69.73614775725594
8556
+ task:
8557
+ type: PairClassification
8558
+ - dataset:
8559
+ config: default
8560
+ name: MTEB TwitterURLCorpus (default)
8561
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
8562
+ split: test
8563
+ type: mteb/twitterurlcorpus-pairclassification
8564
+ metrics:
8565
+ - type: cosine_accuracy
8566
+ value: 88.10105949470253
8567
+ - type: cosine_accuracy_threshold
8568
+ value: 68.95147562026978
8569
+ - type: cosine_ap
8570
+ value: 84.65516103854583
8571
+ - type: cosine_f1
8572
+ value: 76.54581123301605
8573
+ - type: cosine_f1_threshold
8574
+ value: 63.92929553985596
8575
+ - type: cosine_precision
8576
+ value: 72.46526344751685
8577
+ - type: cosine_recall
8578
+ value: 81.11333538651063
8579
+ - type: dot_accuracy
8580
+ value: 88.10105949470253
8581
+ - type: dot_accuracy_threshold
8582
+ value: 68.95147562026978
8583
+ - type: dot_ap
8584
+ value: 84.65516301437592
8585
+ - type: dot_f1
8586
+ value: 76.54581123301605
8587
+ - type: dot_f1_threshold
8588
+ value: 63.92928957939148
8589
+ - type: dot_precision
8590
+ value: 72.46526344751685
8591
+ - type: dot_recall
8592
+ value: 81.11333538651063
8593
+ - type: euclidean_accuracy
8594
+ value: 88.10105949470253
8595
+ - type: euclidean_accuracy_threshold
8596
+ value: 78.80169153213501
8597
+ - type: euclidean_ap
8598
+ value: 84.65517268264233
8599
+ - type: euclidean_f1
8600
+ value: 76.54581123301605
8601
+ - type: euclidean_f1_threshold
8602
+ value: 84.93610620498657
8603
+ - type: euclidean_precision
8604
+ value: 72.46526344751685
8605
+ - type: euclidean_recall
8606
+ value: 81.11333538651063
8607
+ - type: main_score
8608
+ value: 84.65517268264233
8609
+ - type: manhattan_accuracy
8610
+ value: 88.08941669577366
8611
+ - type: manhattan_accuracy_threshold
8612
+ value: 1739.3169403076172
8613
+ - type: manhattan_ap
8614
+ value: 84.64592398855694
8615
+ - type: manhattan_f1
8616
+ value: 76.62890540443034
8617
+ - type: manhattan_f1_threshold
8618
+ value: 1861.344337463379
8619
+ - type: manhattan_precision
8620
+ value: 72.09775967413442
8621
+ - type: manhattan_recall
8622
+ value: 81.76778564829073
8623
+ - type: max_ap
8624
+ value: 84.65517268264233
8625
+ - type: max_f1
8626
+ value: 76.62890540443034
8627
+ - type: max_precision
8628
+ value: 72.46526344751685
8629
+ - type: max_recall
8630
+ value: 81.76778564829073
8631
+ - type: similarity_accuracy
8632
+ value: 88.10105949470253
8633
+ - type: similarity_accuracy_threshold
8634
+ value: 68.95147562026978
8635
+ - type: similarity_ap
8636
+ value: 84.65516103854583
8637
+ - type: similarity_f1
8638
+ value: 76.54581123301605
8639
+ - type: similarity_f1_threshold
8640
+ value: 63.92929553985596
8641
+ - type: similarity_precision
8642
+ value: 72.46526344751685
8643
+ - type: similarity_recall
8644
+ value: 81.11333538651063
8645
+ task:
8646
+ type: PairClassification
8647
  ---
8648
 
8649