yco's picture
Update README.md
2de788d verified
metadata
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - transformers
  - sentence-embedding
  - mteb
  - mteb
model-index:
  - name: e433e634850d125d8b85bee76db3a3b6a0c3bf56
    results:
      - task:
          type: Clustering
        dataset:
          type: lyon-nlp/alloprof
          name: MTEB AlloProfClusteringP2P
          config: default
          split: test
          revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
        metrics:
          - type: v_measure
            value: 56.88600728743999
          - type: v_measures
            value:
              - 0.5396081553520281
              - 0.6022872403200437
              - 0.5515205944691852
              - 0.5595772885785736
              - 0.5632413941951575
      - task:
          type: Clustering
        dataset:
          type: lyon-nlp/alloprof
          name: MTEB AlloProfClusteringS2S
          config: default
          split: test
          revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
        metrics:
          - type: v_measure
            value: 38.199527329051804
          - type: v_measures
            value:
              - 0.42157254138936706
              - 0.36882298663461527
              - 0.3134327610337458
              - 0.40391031391690396
              - 0.3832775043562133
      - task:
          type: Reranking
        dataset:
          type: lyon-nlp/mteb-fr-reranking-alloprof-s2p
          name: MTEB AlloprofReranking
          config: default
          split: test
          revision: 65393d0d7a08a10b4e348135e824f385d420b0fd
        metrics:
          - type: map
            value: 68.73372257500206
          - type: mrr
            value: 70.07434479260904
          - type: nAUC_map_diff1
            value: 50.95933484071007
          - type: nAUC_map_max
            value: 13.75463910519138
          - type: nAUC_mrr_diff1
            value: 50.494303783447656
          - type: nAUC_mrr_max
            value: 14.460935217916187
      - task:
          type: Retrieval
        dataset:
          type: lyon-nlp/alloprof
          name: MTEB AlloprofRetrieval
          config: default
          split: test
          revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd
        metrics:
          - type: map_at_1
            value: 21.675
          - type: map_at_10
            value: 32.274
          - type: map_at_100
            value: 33.316
          - type: map_at_1000
            value: 33.387
          - type: map_at_20
            value: 32.864
          - type: map_at_3
            value: 29.166999999999998
          - type: map_at_5
            value: 30.946
          - type: mrr_at_1
            value: 21.675302245250432
          - type: mrr_at_10
            value: 32.274309839076714
          - type: mrr_at_100
            value: 33.31571024590564
          - type: mrr_at_1000
            value: 33.3868130424392
          - type: mrr_at_20
            value: 32.863978562081925
          - type: mrr_at_3
            value: 29.16666666666669
          - type: mrr_at_5
            value: 30.94559585492234
          - type: nauc_map_at_1000_diff1
            value: 34.85808309940442
          - type: nauc_map_at_1000_max
            value: 31.058801579682825
          - type: nauc_map_at_100_diff1
            value: 34.842898344470846
          - type: nauc_map_at_100_max
            value: 31.077561464904342
          - type: nauc_map_at_10_diff1
            value: 34.6773118480208
          - type: nauc_map_at_10_max
            value: 30.8489850780642
          - type: nauc_map_at_1_diff1
            value: 40.65773695743684
          - type: nauc_map_at_1_max
            value: 28.766036921254617
          - type: nauc_map_at_20_diff1
            value: 34.73935242577166
          - type: nauc_map_at_20_max
            value: 31.03143938077287
          - type: nauc_map_at_3_diff1
            value: 35.12059625476991
          - type: nauc_map_at_3_max
            value: 30.48787855768291
          - type: nauc_map_at_5_diff1
            value: 34.73453235094986
          - type: nauc_map_at_5_max
            value: 30.3860304682398
          - type: nauc_mrr_at_1000_diff1
            value: 34.85808309940442
          - type: nauc_mrr_at_1000_max
            value: 31.058801579682825
          - type: nauc_mrr_at_100_diff1
            value: 34.842898344470846
          - type: nauc_mrr_at_100_max
            value: 31.077561464904342
          - type: nauc_mrr_at_10_diff1
            value: 34.6773118480208
          - type: nauc_mrr_at_10_max
            value: 30.8489850780642
          - type: nauc_mrr_at_1_diff1
            value: 40.65773695743684
          - type: nauc_mrr_at_1_max
            value: 28.766036921254617
          - type: nauc_mrr_at_20_diff1
            value: 34.73935242577166
          - type: nauc_mrr_at_20_max
            value: 31.03143938077287
          - type: nauc_mrr_at_3_diff1
            value: 35.12059625476991
          - type: nauc_mrr_at_3_max
            value: 30.48787855768291
          - type: nauc_mrr_at_5_diff1
            value: 34.73453235094986
          - type: nauc_mrr_at_5_max
            value: 30.3860304682398
          - type: nauc_ndcg_at_1000_diff1
            value: 34.04342467121623
          - type: nauc_ndcg_at_1000_max
            value: 32.311398352704686
          - type: nauc_ndcg_at_100_diff1
            value: 33.67278941726764
          - type: nauc_ndcg_at_100_max
            value: 33.0229606203184
          - type: nauc_ndcg_at_10_diff1
            value: 32.93808280492078
          - type: nauc_ndcg_at_10_max
            value: 32.07111775221638
          - type: nauc_ndcg_at_1_diff1
            value: 40.65773695743684
          - type: nauc_ndcg_at_1_max
            value: 28.766036921254617
          - type: nauc_ndcg_at_20_diff1
            value: 33.141323431064585
          - type: nauc_ndcg_at_20_max
            value: 32.76436962238286
          - type: nauc_ndcg_at_3_diff1
            value: 33.77769745974645
          - type: nauc_ndcg_at_3_max
            value: 31.072988073016912
          - type: nauc_ndcg_at_5_diff1
            value: 33.091582792245696
          - type: nauc_ndcg_at_5_max
            value: 30.92378976230745
          - type: nauc_precision_at_1000_diff1
            value: 33.74743287990321
          - type: nauc_precision_at_1000_max
            value: 60.08005213097628
          - type: nauc_precision_at_100_diff1
            value: 28.869275501873236
          - type: nauc_precision_at_100_max
            value: 46.35483380447927
          - type: nauc_precision_at_10_diff1
            value: 27.910043146581497
          - type: nauc_precision_at_10_max
            value: 36.07399824307888
          - type: nauc_precision_at_1_diff1
            value: 40.65773695743684
          - type: nauc_precision_at_1_max
            value: 28.766036921254617
          - type: nauc_precision_at_20_diff1
            value: 28.144265629196163
          - type: nauc_precision_at_20_max
            value: 39.60361579056115
          - type: nauc_precision_at_3_diff1
            value: 30.31893725671278
          - type: nauc_precision_at_3_max
            value: 32.63695126407254
          - type: nauc_precision_at_5_diff1
            value: 28.699678130380235
          - type: nauc_precision_at_5_max
            value: 32.37908851919098
          - type: nauc_recall_at_1000_diff1
            value: 33.74743287990342
          - type: nauc_recall_at_1000_max
            value: 60.080052130975346
          - type: nauc_recall_at_100_diff1
            value: 28.869275501873247
          - type: nauc_recall_at_100_max
            value: 46.35483380447917
          - type: nauc_recall_at_10_diff1
            value: 27.910043146581508
          - type: nauc_recall_at_10_max
            value: 36.07399824307888
          - type: nauc_recall_at_1_diff1
            value: 40.65773695743684
          - type: nauc_recall_at_1_max
            value: 28.766036921254617
          - type: nauc_recall_at_20_diff1
            value: 28.14426562919617
          - type: nauc_recall_at_20_max
            value: 39.60361579056118
          - type: nauc_recall_at_3_diff1
            value: 30.318937256712804
          - type: nauc_recall_at_3_max
            value: 32.63695126407256
          - type: nauc_recall_at_5_diff1
            value: 28.699678130380224
          - type: nauc_recall_at_5_max
            value: 32.37908851919102
          - type: ndcg_at_1
            value: 21.675
          - type: ndcg_at_10
            value: 38.06
          - type: ndcg_at_100
            value: 43.491
          - type: ndcg_at_1000
            value: 45.432
          - type: ndcg_at_20
            value: 40.217000000000006
          - type: ndcg_at_3
            value: 31.642
          - type: ndcg_at_5
            value: 34.837
          - type: precision_at_1
            value: 21.675
          - type: precision_at_10
            value: 5.652
          - type: precision_at_100
            value: 0.827
          - type: precision_at_1000
            value: 0.098
          - type: precision_at_20
            value: 3.253
          - type: precision_at_3
            value: 12.939
          - type: precision_at_5
            value: 9.309000000000001
          - type: recall_at_1
            value: 21.675
          - type: recall_at_10
            value: 56.52
          - type: recall_at_100
            value: 82.729
          - type: recall_at_1000
            value: 98.1
          - type: recall_at_20
            value: 65.069
          - type: recall_at_3
            value: 38.817
          - type: recall_at_5
            value: 46.546
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_reviews_multi
          name: MTEB AmazonReviewsClassification (fr)
          config: fr
          split: test
          revision: 1399c76144fd37290681b995c656ef9b2e06e26d
        metrics:
          - type: accuracy
            value: 43.51
          - type: f1
            value: 41.3284674671926
          - type: f1_weighted
            value: 41.3284674671926
      - task:
          type: Retrieval
        dataset:
          type: maastrichtlawtech/bsard
          name: MTEB BSARDRetrieval
          config: default
          split: test
          revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59
        metrics:
          - type: map_at_1
            value: 5.405
          - type: map_at_10
            value: 9.008
          - type: map_at_100
            value: 9.932
          - type: map_at_1000
            value: 10.042
          - type: map_at_20
            value: 9.389
          - type: map_at_3
            value: 7.883
          - type: map_at_5
            value: 8.626000000000001
          - type: mrr_at_1
            value: 5.405405405405405
          - type: mrr_at_10
            value: 9.007579007579007
          - type: mrr_at_100
            value: 9.931517094611667
          - type: mrr_at_1000
            value: 10.0416462267215
          - type: mrr_at_20
            value: 9.38869595990339
          - type: mrr_at_3
            value: 7.882882882882883
          - type: mrr_at_5
            value: 8.626126126126126
          - type: nauc_map_at_1000_diff1
            value: 23.53549434486455
          - type: nauc_map_at_1000_max
            value: 9.977010641647402
          - type: nauc_map_at_100_diff1
            value: 23.50007884241435
          - type: nauc_map_at_100_max
            value: 9.984274734441085
          - type: nauc_map_at_10_diff1
            value: 24.69444512826233
          - type: nauc_map_at_10_max
            value: 9.726162724771594
          - type: nauc_map_at_1_diff1
            value: 40.88188899137848
          - type: nauc_map_at_1_max
            value: 12.044739470755896
          - type: nauc_map_at_20_diff1
            value: 23.833757177107557
          - type: nauc_map_at_20_max
            value: 9.94328216894336
          - type: nauc_map_at_3_diff1
            value: 28.320570164876653
          - type: nauc_map_at_3_max
            value: 11.195397944839767
          - type: nauc_map_at_5_diff1
            value: 25.86894200735248
          - type: nauc_map_at_5_max
            value: 8.43950569758736
          - type: nauc_mrr_at_1000_diff1
            value: 23.53549434486455
          - type: nauc_mrr_at_1000_max
            value: 9.977010641647402
          - type: nauc_mrr_at_100_diff1
            value: 23.50007884241435
          - type: nauc_mrr_at_100_max
            value: 9.984274734441085
          - type: nauc_mrr_at_10_diff1
            value: 24.69444512826233
          - type: nauc_mrr_at_10_max
            value: 9.726162724771594
          - type: nauc_mrr_at_1_diff1
            value: 40.88188899137848
          - type: nauc_mrr_at_1_max
            value: 12.044739470755896
          - type: nauc_mrr_at_20_diff1
            value: 23.833757177107557
          - type: nauc_mrr_at_20_max
            value: 9.94328216894336
          - type: nauc_mrr_at_3_diff1
            value: 28.320570164876653
          - type: nauc_mrr_at_3_max
            value: 11.195397944839767
          - type: nauc_mrr_at_5_diff1
            value: 25.86894200735248
          - type: nauc_mrr_at_5_max
            value: 8.43950569758736
          - type: nauc_ndcg_at_1000_diff1
            value: 15.939402272339343
          - type: nauc_ndcg_at_1000_max
            value: 10.076089125537772
          - type: nauc_ndcg_at_100_diff1
            value: 16.12740122067642
          - type: nauc_ndcg_at_100_max
            value: 10.39935154464689
          - type: nauc_ndcg_at_10_diff1
            value: 20.455941061369295
          - type: nauc_ndcg_at_10_max
            value: 9.350349883274461
          - type: nauc_ndcg_at_1_diff1
            value: 40.88188899137848
          - type: nauc_ndcg_at_1_max
            value: 12.044739470755896
          - type: nauc_ndcg_at_20_diff1
            value: 18.267195122936364
          - type: nauc_ndcg_at_20_max
            value: 10.211299135510837
          - type: nauc_ndcg_at_3_diff1
            value: 26.453038443158267
          - type: nauc_ndcg_at_3_max
            value: 10.628723618231271
          - type: nauc_ndcg_at_5_diff1
            value: 22.815939702854084
          - type: nauc_ndcg_at_5_max
            value: 6.308794763068443
          - type: nauc_precision_at_1000_diff1
            value: -7.915540524594587
          - type: nauc_precision_at_1000_max
            value: 10.441250503021037
          - type: nauc_precision_at_100_diff1
            value: 2.7415108070462253
          - type: nauc_precision_at_100_max
            value: 11.957692005514204
          - type: nauc_precision_at_10_diff1
            value: 12.731449206012213
          - type: nauc_precision_at_10_max
            value: 9.218464561250887
          - type: nauc_precision_at_1_diff1
            value: 40.88188899137848
          - type: nauc_precision_at_1_max
            value: 12.044739470755896
          - type: nauc_precision_at_20_diff1
            value: 8.658189595700664
          - type: nauc_precision_at_20_max
            value: 11.571072137198621
          - type: nauc_precision_at_3_diff1
            value: 22.7637681983756
          - type: nauc_precision_at_3_max
            value: 9.361635703809425
          - type: nauc_precision_at_5_diff1
            value: 17.02002973192349
          - type: nauc_precision_at_5_max
            value: 1.8844406919262011
          - type: nauc_recall_at_1000_diff1
            value: -7.915540524594531
          - type: nauc_recall_at_1000_max
            value: 10.441250503021028
          - type: nauc_recall_at_100_diff1
            value: 2.741510807046166
          - type: nauc_recall_at_100_max
            value: 11.957692005514156
          - type: nauc_recall_at_10_diff1
            value: 12.731449206012224
          - type: nauc_recall_at_10_max
            value: 9.218464561250883
          - type: nauc_recall_at_1_diff1
            value: 40.88188899137848
          - type: nauc_recall_at_1_max
            value: 12.044739470755896
          - type: nauc_recall_at_20_diff1
            value: 8.65818959570063
          - type: nauc_recall_at_20_max
            value: 11.571072137198572
          - type: nauc_recall_at_3_diff1
            value: 22.763768198375587
          - type: nauc_recall_at_3_max
            value: 9.361635703809409
          - type: nauc_recall_at_5_diff1
            value: 17.02002973192351
          - type: nauc_recall_at_5_max
            value: 1.8844406919262173
          - type: ndcg_at_1
            value: 5.405
          - type: ndcg_at_10
            value: 11.045
          - type: ndcg_at_100
            value: 16.724
          - type: ndcg_at_1000
            value: 20.325
          - type: ndcg_at_20
            value: 12.42
          - type: ndcg_at_3
            value: 8.746
          - type: ndcg_at_5
            value: 10.065
          - type: precision_at_1
            value: 5.405
          - type: precision_at_10
            value: 1.757
          - type: precision_at_100
            value: 0.468
          - type: precision_at_1000
            value: 0.077
          - type: precision_at_20
            value: 1.149
          - type: precision_at_3
            value: 3.7539999999999996
          - type: precision_at_5
            value: 2.883
          - type: recall_at_1
            value: 5.405
          - type: recall_at_10
            value: 17.568
          - type: recall_at_100
            value: 46.847
          - type: recall_at_1000
            value: 76.577
          - type: recall_at_20
            value: 22.973
          - type: recall_at_3
            value: 11.261000000000001
          - type: recall_at_5
            value: 14.414
      - task:
          type: Clustering
        dataset:
          type: lyon-nlp/clustering-hal-s2s
          name: MTEB HALClusteringS2S
          config: default
          split: test
          revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915
        metrics:
          - type: v_measure
            value: 24.495384349905265
          - type: v_measures
            value:
              - 0.2850587858600384
              - 0.274086904447773
              - 0.2446866774990972
              - 0.26946100959565517
              - 0.24156528297396174
      - task:
          type: Clustering
        dataset:
          type: reciTAL/mlsum
          name: MTEB MLSUMClusteringP2P
          config: default
          split: test
          revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
        metrics:
          - type: v_measure
            value: 41.7878688793447
          - type: v_measures
            value:
              - 0.4201324393825989
              - 0.4205306567437461
              - 0.4221300501395374
              - 0.4210735177933313
              - 0.38124298228695813
      - task:
          type: Clustering
        dataset:
          type: reciTAL/mlsum
          name: MTEB MLSUMClusteringS2S
          config: default
          split: test
          revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
        metrics:
          - type: v_measure
            value: 41.54533473611554
          - type: v_measures
            value:
              - 0.3978917671338969
              - 0.42610299599987944
              - 0.4152131658150196
              - 0.40558711021249855
              - 0.38327501252308305
      - task:
          type: Classification
        dataset:
          type: mteb/mtop_domain
          name: MTEB MTOPDomainClassification (fr)
          config: fr
          split: test
          revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
        metrics:
          - type: accuracy
            value: 85.33041027247104
          - type: f1
            value: 85.4043088703478
          - type: f1_weighted
            value: 85.22086763441686
      - task:
          type: Classification
        dataset:
          type: mteb/mtop_intent
          name: MTEB MTOPIntentClassification (fr)
          config: fr
          split: test
          revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        metrics:
          - type: accuracy
            value: 59.01346695897275
          - type: f1
            value: 41.296845063208316
          - type: f1_weighted
            value: 61.793813202867696
      - task:
          type: Classification
        dataset:
          type: mteb/masakhanews
          name: MTEB MasakhaNEWSClassification (fra)
          config: fra
          split: test
          revision: 18193f187b92da67168c655c9973a165ed9593dd
        metrics:
          - type: accuracy
            value: 72.60663507109004
          - type: f1
            value: 68.67522100429781
          - type: f1_weighted
            value: 72.75616093668002
      - task:
          type: Clustering
        dataset:
          type: masakhane/masakhanews
          name: MTEB MasakhaNEWSClusteringP2P (fra)
          config: fra
          split: test
          revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
        metrics:
          - type: v_measure
            value: 49.17691007381563
          - type: v_measures
            value:
              - 1
              - 0.033833191750480725
              - 0.5707463198244268
              - 0.1318223737892885
              - 0.7224436183265853
      - task:
          type: Clustering
        dataset:
          type: masakhane/masakhanews
          name: MTEB MasakhaNEWSClusteringS2S (fra)
          config: fra
          split: test
          revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
        metrics:
          - type: v_measure
            value: 26.9350763881635
          - type: v_measures
            value:
              - 1
              - 0.0002883507347309009
              - 0.18259625098776155
              - 0.025306110065234755
              - 0.1385631076204479
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_intent
          name: MTEB MassiveIntentClassification (fr)
          config: fr
          split: test
          revision: 4672e20407010da34463acc759c162ca9734bca6
        metrics:
          - type: accuracy
            value: 65.1546738399462
          - type: f1
            value: 62.81367149102006
          - type: f1_weighted
            value: 64.45478181518959
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_scenario
          name: MTEB MassiveScenarioClassification (fr)
          config: fr
          split: test
          revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8
        metrics:
          - type: accuracy
            value: 69.94283792871553
          - type: f1
            value: 69.3387310036327
          - type: f1_weighted
            value: 69.77979200675047
      - task:
          type: Retrieval
        dataset:
          type: jinaai/mintakaqa
          name: MTEB MintakaRetrieval (fr)
          config: fr
          split: test
          revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
        metrics:
          - type: map_at_1
            value: 14.536999999999999
          - type: map_at_10
            value: 22.972
          - type: map_at_100
            value: 24.046
          - type: map_at_1000
            value: 24.15
          - type: map_at_20
            value: 23.56
          - type: map_at_3
            value: 20.639
          - type: map_at_5
            value: 21.886
          - type: mrr_at_1
            value: 14.537264537264537
          - type: mrr_at_10
            value: 22.97172172172171
          - type: mrr_at_100
            value: 24.04581030084757
          - type: mrr_at_1000
            value: 24.15012351833827
          - type: mrr_at_20
            value: 23.559920001131612
          - type: mrr_at_3
            value: 20.63882063882061
          - type: mrr_at_5
            value: 21.88574938574935
          - type: nauc_map_at_1000_diff1
            value: 25.172495501911456
          - type: nauc_map_at_1000_max
            value: 39.07442097828252
          - type: nauc_map_at_100_diff1
            value: 25.129142743145884
          - type: nauc_map_at_100_max
            value: 39.03725272182565
          - type: nauc_map_at_10_diff1
            value: 25.52237435145409
          - type: nauc_map_at_10_max
            value: 39.5761256079619
          - type: nauc_map_at_1_diff1
            value: 31.68506359690787
          - type: nauc_map_at_1_max
            value: 39.251552013635425
          - type: nauc_map_at_20_diff1
            value: 25.223544981725286
          - type: nauc_map_at_20_max
            value: 39.20307777977743
          - type: nauc_map_at_3_diff1
            value: 26.5913043939904
          - type: nauc_map_at_3_max
            value: 40.38909639557377
          - type: nauc_map_at_5_diff1
            value: 25.90291761511258
          - type: nauc_map_at_5_max
            value: 40.08746876057708
          - type: nauc_mrr_at_1000_diff1
            value: 25.172495501911456
          - type: nauc_mrr_at_1000_max
            value: 39.07442097828252
          - type: nauc_mrr_at_100_diff1
            value: 25.129142743145884
          - type: nauc_mrr_at_100_max
            value: 39.03725272182565
          - type: nauc_mrr_at_10_diff1
            value: 25.52237435145409
          - type: nauc_mrr_at_10_max
            value: 39.5761256079619
          - type: nauc_mrr_at_1_diff1
            value: 31.68506359690787
          - type: nauc_mrr_at_1_max
            value: 39.251552013635425
          - type: nauc_mrr_at_20_diff1
            value: 25.223544981725286
          - type: nauc_mrr_at_20_max
            value: 39.20307777977743
          - type: nauc_mrr_at_3_diff1
            value: 26.5913043939904
          - type: nauc_mrr_at_3_max
            value: 40.38909639557377
          - type: nauc_mrr_at_5_diff1
            value: 25.90291761511258
          - type: nauc_mrr_at_5_max
            value: 40.08746876057708
          - type: nauc_ndcg_at_1000_diff1
            value: 23.22275566961323
          - type: nauc_ndcg_at_1000_max
            value: 37.77760760027764
          - type: nauc_ndcg_at_100_diff1
            value: 21.715763741257927
          - type: nauc_ndcg_at_100_max
            value: 36.46541121995108
          - type: nauc_ndcg_at_10_diff1
            value: 23.278761630662373
          - type: nauc_ndcg_at_10_max
            value: 38.7930407055593
          - type: nauc_ndcg_at_1_diff1
            value: 31.68506359690787
          - type: nauc_ndcg_at_1_max
            value: 39.251552013635425
          - type: nauc_ndcg_at_20_diff1
            value: 22.247483519405314
          - type: nauc_ndcg_at_20_max
            value: 37.52699283756433
          - type: nauc_ndcg_at_3_diff1
            value: 25.285332146360567
          - type: nauc_ndcg_at_3_max
            value: 40.49755286945492
          - type: nauc_ndcg_at_5_diff1
            value: 24.188132420084607
          - type: nauc_ndcg_at_5_max
            value: 40.023420096094924
          - type: nauc_precision_at_1000_diff1
            value: 22.011383616462943
          - type: nauc_precision_at_1000_max
            value: 33.1171975223399
          - type: nauc_precision_at_100_diff1
            value: 8.869925191243802
          - type: nauc_precision_at_100_max
            value: 24.642097404720463
          - type: nauc_precision_at_10_diff1
            value: 17.74075352930919
          - type: nauc_precision_at_10_max
            value: 36.488352516736775
          - type: nauc_precision_at_1_diff1
            value: 31.68506359690787
          - type: nauc_precision_at_1_max
            value: 39.251552013635425
          - type: nauc_precision_at_20_diff1
            value: 14.092673370526898
          - type: nauc_precision_at_20_max
            value: 32.16083119966346
          - type: nauc_precision_at_3_diff1
            value: 22.16344389106631
          - type: nauc_precision_at_3_max
            value: 40.70883095791623
          - type: nauc_precision_at_5_diff1
            value: 20.119543069972256
          - type: nauc_precision_at_5_max
            value: 39.79763147435235
          - type: nauc_recall_at_1000_diff1
            value: 22.011383616462528
          - type: nauc_recall_at_1000_max
            value: 33.117197522340085
          - type: nauc_recall_at_100_diff1
            value: 8.869925191243775
          - type: nauc_recall_at_100_max
            value: 24.64209740472041
          - type: nauc_recall_at_10_diff1
            value: 17.740753529309178
          - type: nauc_recall_at_10_max
            value: 36.48835251673679
          - type: nauc_recall_at_1_diff1
            value: 31.68506359690787
          - type: nauc_recall_at_1_max
            value: 39.251552013635425
          - type: nauc_recall_at_20_diff1
            value: 14.092673370526915
          - type: nauc_recall_at_20_max
            value: 32.160831199663455
          - type: nauc_recall_at_3_diff1
            value: 22.163443891066322
          - type: nauc_recall_at_3_max
            value: 40.708830957916234
          - type: nauc_recall_at_5_diff1
            value: 20.119543069972217
          - type: nauc_recall_at_5_max
            value: 39.79763147435234
          - type: ndcg_at_1
            value: 14.536999999999999
          - type: ndcg_at_10
            value: 27.485
          - type: ndcg_at_100
            value: 33.206
          - type: ndcg_at_1000
            value: 36.382999999999996
          - type: ndcg_at_20
            value: 29.635
          - type: ndcg_at_3
            value: 22.597
          - type: ndcg_at_5
            value: 24.851
          - type: precision_at_1
            value: 14.536999999999999
          - type: precision_at_10
            value: 4.189
          - type: precision_at_100
            value: 0.698
          - type: precision_at_1000
            value: 0.096
          - type: precision_at_20
            value: 2.52
          - type: precision_at_3
            value: 9.419
          - type: precision_at_5
            value: 6.749
          - type: recall_at_1
            value: 14.536999999999999
          - type: recall_at_10
            value: 41.892
          - type: recall_at_100
            value: 69.779
          - type: recall_at_1000
            value: 95.61800000000001
          - type: recall_at_20
            value: 50.41
          - type: recall_at_3
            value: 28.255999999999997
          - type: recall_at_5
            value: 33.743
      - task:
          type: PairClassification
        dataset:
          type: GEM/opusparcus
          name: MTEB OpusparcusPC (fr)
          config: fr
          split: test
          revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a
        metrics:
          - type: cos_sim_accuracy
            value: 81.74386920980926
          - type: cos_sim_ap
            value: 93.18281680904117
          - type: cos_sim_f1
            value: 87.37233054781802
          - type: cos_sim_precision
            value: 82.04010462074979
          - type: cos_sim_recall
            value: 93.44587884806356
          - type: dot_accuracy
            value: 81.74386920980926
          - type: dot_ap
            value: 93.18281680904117
          - type: dot_f1
            value: 87.37233054781802
          - type: dot_precision
            value: 82.04010462074979
          - type: dot_recall
            value: 93.44587884806356
          - type: euclidean_accuracy
            value: 81.74386920980926
          - type: euclidean_ap
            value: 93.18281680904117
          - type: euclidean_f1
            value: 87.37233054781802
          - type: euclidean_precision
            value: 82.04010462074979
          - type: euclidean_recall
            value: 93.44587884806356
          - type: manhattan_accuracy
            value: 81.74386920980926
          - type: manhattan_ap
            value: 93.17517480971131
          - type: manhattan_f1
            value: 87.37864077669903
          - type: manhattan_precision
            value: 81.74740484429066
          - type: manhattan_recall
            value: 93.84309831181727
          - type: max_accuracy
            value: 81.74386920980926
          - type: max_ap
            value: 93.18281680904117
          - type: max_f1
            value: 87.37864077669903
      - task:
          type: PairClassification
        dataset:
          type: google-research-datasets/paws-x
          name: MTEB PawsX (fr)
          config: fr
          split: test
          revision: 8a04d940a42cd40658986fdd8e3da561533a3646
        metrics:
          - type: cos_sim_accuracy
            value: 61.1
          - type: cos_sim_ap
            value: 60.75603519868964
          - type: cos_sim_f1
            value: 62.78646780647509
          - type: cos_sim_precision
            value: 46.74972914409534
          - type: cos_sim_recall
            value: 95.5703211517165
          - type: dot_accuracy
            value: 61.1
          - type: dot_ap
            value: 60.74807680023078
          - type: dot_f1
            value: 62.78646780647509
          - type: dot_precision
            value: 46.74972914409534
          - type: dot_recall
            value: 95.5703211517165
          - type: euclidean_accuracy
            value: 61.1
          - type: euclidean_ap
            value: 60.756144387817734
          - type: euclidean_f1
            value: 62.78646780647509
          - type: euclidean_precision
            value: 46.74972914409534
          - type: euclidean_recall
            value: 95.5703211517165
          - type: manhattan_accuracy
            value: 61.150000000000006
          - type: manhattan_ap
            value: 60.685188544775116
          - type: manhattan_f1
            value: 62.7721335268505
          - type: manhattan_precision
            value: 46.6810577441986
          - type: manhattan_recall
            value: 95.79180509413068
          - type: max_accuracy
            value: 61.150000000000006
          - type: max_ap
            value: 60.756144387817734
          - type: max_f1
            value: 62.78646780647509
      - task:
          type: STS
        dataset:
          type: Lajavaness/SICK-fr
          name: MTEB SICKFr
          config: default
          split: test
          revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a
        metrics:
          - type: cos_sim_pearson
            value: 83.1543597030015
          - type: cos_sim_spearman
            value: 77.10092303546944
          - type: euclidean_pearson
            value: 80.27115846915481
          - type: euclidean_spearman
            value: 77.10092516058822
          - type: manhattan_pearson
            value: 80.30090425968062
          - type: manhattan_spearman
            value: 77.09423647945061
      - task:
          type: STS
        dataset:
          type: mteb/sts22-crosslingual-sts
          name: MTEB STS22 (fr)
          config: fr
          split: test
          revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
        metrics:
          - type: cos_sim_pearson
            value: 79.20797144286122
          - type: cos_sim_spearman
            value: 80.31452099282514
          - type: euclidean_pearson
            value: 78.43621396282957
          - type: euclidean_spearman
            value: 80.31452099282514
          - type: manhattan_pearson
            value: 78.29678738374866
          - type: manhattan_spearman
            value: 79.93185465249057
      - task:
          type: STS
        dataset:
          type: PhilipMay/stsb_multi_mt
          name: MTEB STSBenchmarkMultilingualSTS (fr)
          config: fr
          split: test
          revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c
        metrics:
          - type: cos_sim_pearson
            value: 84.69215133897265
          - type: cos_sim_spearman
            value: 84.35617480959016
          - type: euclidean_pearson
            value: 83.85371663492563
          - type: euclidean_spearman
            value: 84.35617480959016
          - type: manhattan_pearson
            value: 83.85857789722276
          - type: manhattan_spearman
            value: 84.30794186513978
      - task:
          type: Summarization
        dataset:
          type: lyon-nlp/summarization-summeval-fr-p2p
          name: MTEB SummEvalFr
          config: default
          split: test
          revision: b385812de6a9577b6f4d0f88c6a6e35395a94054
        metrics:
          - type: cos_sim_pearson
            value: 29.187176809104393
          - type: cos_sim_spearman
            value: 29.65160679657583
          - type: dot_pearson
            value: 29.18717349611766
          - type: dot_spearman
            value: 29.65160679657583
      - task:
          type: Reranking
        dataset:
          type: lyon-nlp/mteb-fr-reranking-syntec-s2p
          name: MTEB SyntecReranking
          config: default
          split: test
          revision: daf0863838cd9e3ba50544cdce3ac2b338a1b0ad
        metrics:
          - type: map
            value: 82.76666666666667
          - type: mrr
            value: 82.76666666666667
          - type: nAUC_map_diff1
            value: 52.548913230162405
          - type: nAUC_map_max
            value: -2.824065935620183
          - type: nAUC_mrr_diff1
            value: 52.548913230162405
          - type: nAUC_mrr_max
            value: -2.824065935620183
      - task:
          type: Retrieval
        dataset:
          type: lyon-nlp/mteb-fr-retrieval-syntec-s2p
          name: MTEB SyntecRetrieval
          config: default
          split: test
          revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9
        metrics:
          - type: map_at_1
            value: 57.99999999999999
          - type: map_at_10
            value: 72.356
          - type: map_at_100
            value: 72.625
          - type: map_at_1000
            value: 72.625
          - type: map_at_20
            value: 72.625
          - type: map_at_3
            value: 70.333
          - type: map_at_5
            value: 71.48299999999999
          - type: mrr_at_1
            value: 57.99999999999999
          - type: mrr_at_10
            value: 72.35634920634922
          - type: mrr_at_100
            value: 72.62532693914275
          - type: mrr_at_1000
            value: 72.62532693914275
          - type: mrr_at_20
            value: 72.62532693914275
          - type: mrr_at_3
            value: 70.33333333333333
          - type: mrr_at_5
            value: 71.48333333333333
          - type: nauc_map_at_1000_diff1
            value: 57.27081552588017
          - type: nauc_map_at_1000_max
            value: 13.401922890723771
          - type: nauc_map_at_100_diff1
            value: 57.27081552588017
          - type: nauc_map_at_100_max
            value: 13.401922890723771
          - type: nauc_map_at_10_diff1
            value: 57.39952453922188
          - type: nauc_map_at_10_max
            value: 14.093164837730344
          - type: nauc_map_at_1_diff1
            value: 57.23800679107291
          - type: nauc_map_at_1_max
            value: 11.039846765533865
          - type: nauc_map_at_20_diff1
            value: 57.27081552588017
          - type: nauc_map_at_20_max
            value: 13.401922890723771
          - type: nauc_map_at_3_diff1
            value: 58.14875247321224
          - type: nauc_map_at_3_max
            value: 14.538312305676238
          - type: nauc_map_at_5_diff1
            value: 57.34940275695991
          - type: nauc_map_at_5_max
            value: 13.675180459395065
          - type: nauc_mrr_at_1000_diff1
            value: 57.27081552588017
          - type: nauc_mrr_at_1000_max
            value: 13.401922890723771
          - type: nauc_mrr_at_100_diff1
            value: 57.27081552588017
          - type: nauc_mrr_at_100_max
            value: 13.401922890723771
          - type: nauc_mrr_at_10_diff1
            value: 57.39952453922188
          - type: nauc_mrr_at_10_max
            value: 14.093164837730344
          - type: nauc_mrr_at_1_diff1
            value: 57.23800679107291
          - type: nauc_mrr_at_1_max
            value: 11.039846765533865
          - type: nauc_mrr_at_20_diff1
            value: 57.27081552588017
          - type: nauc_mrr_at_20_max
            value: 13.401922890723771
          - type: nauc_mrr_at_3_diff1
            value: 58.14875247321224
          - type: nauc_mrr_at_3_max
            value: 14.538312305676238
          - type: nauc_mrr_at_5_diff1
            value: 57.34940275695991
          - type: nauc_mrr_at_5_max
            value: 13.675180459395065
          - type: nauc_ndcg_at_1000_diff1
            value: 57.38511684819052
          - type: nauc_ndcg_at_1000_max
            value: 13.993185568467656
          - type: nauc_ndcg_at_100_diff1
            value: 57.38511684819052
          - type: nauc_ndcg_at_100_max
            value: 13.993185568467656
          - type: nauc_ndcg_at_10_diff1
            value: 57.93396526410134
          - type: nauc_ndcg_at_10_max
            value: 17.16319020800824
          - type: nauc_ndcg_at_1_diff1
            value: 57.23800679107291
          - type: nauc_ndcg_at_1_max
            value: 11.039846765533865
          - type: nauc_ndcg_at_20_diff1
            value: 57.38511684819052
          - type: nauc_ndcg_at_20_max
            value: 13.993185568467656
          - type: nauc_ndcg_at_3_diff1
            value: 59.36410104940948
          - type: nauc_ndcg_at_3_max
            value: 17.128826753860732
          - type: nauc_ndcg_at_5_diff1
            value: 57.71094150714742
          - type: nauc_ndcg_at_5_max
            value: 15.62784584334318
          - type: nauc_precision_at_1000_diff1
            value: nan
          - type: nauc_precision_at_1000_max
            value: nan
          - type: nauc_precision_at_100_diff1
            value: nan
          - type: nauc_precision_at_100_max
            value: nan
          - type: nauc_precision_at_10_diff1
            value: 66.79505135387465
          - type: nauc_precision_at_10_max
            value: 70.47152194211033
          - type: nauc_precision_at_1_diff1
            value: 57.23800679107291
          - type: nauc_precision_at_1_max
            value: 11.039846765533865
          - type: nauc_precision_at_20_diff1
            value: 100
          - type: nauc_precision_at_20_max
            value: 100
          - type: nauc_precision_at_3_diff1
            value: 65.65896518060521
          - type: nauc_precision_at_3_max
            value: 30.198503091441538
          - type: nauc_precision_at_5_diff1
            value: 60.04201680672288
          - type: nauc_precision_at_5_max
            value: 29.000933706816145
          - type: nauc_recall_at_1000_diff1
            value: nan
          - type: nauc_recall_at_1000_max
            value: nan
          - type: nauc_recall_at_100_diff1
            value: nan
          - type: nauc_recall_at_100_max
            value: nan
          - type: nauc_recall_at_10_diff1
            value: 66.7950513538749
          - type: nauc_recall_at_10_max
            value: 70.47152194211012
          - type: nauc_recall_at_1_diff1
            value: 57.23800679107291
          - type: nauc_recall_at_1_max
            value: 11.039846765533865
          - type: nauc_recall_at_20_diff1
            value: nan
          - type: nauc_recall_at_20_max
            value: nan
          - type: nauc_recall_at_3_diff1
            value: 65.65896518060525
          - type: nauc_recall_at_3_max
            value: 30.19850309144154
          - type: nauc_recall_at_5_diff1
            value: 60.0420168067226
          - type: nauc_recall_at_5_max
            value: 29.000933706816
          - type: ndcg_at_1
            value: 57.99999999999999
          - type: ndcg_at_10
            value: 78.19800000000001
          - type: ndcg_at_100
            value: 79.199
          - type: ndcg_at_1000
            value: 79.199
          - type: ndcg_at_20
            value: 79.199
          - type: ndcg_at_3
            value: 74.119
          - type: ndcg_at_5
            value: 76.184
          - type: precision_at_1
            value: 57.99999999999999
          - type: precision_at_10
            value: 9.6
          - type: precision_at_100
            value: 1
          - type: precision_at_1000
            value: 0.1
          - type: precision_at_20
            value: 5
          - type: precision_at_3
            value: 28.333000000000002
          - type: precision_at_5
            value: 18
          - type: recall_at_1
            value: 57.99999999999999
          - type: recall_at_10
            value: 96
          - type: recall_at_100
            value: 100
          - type: recall_at_1000
            value: 100
          - type: recall_at_20
            value: 100
          - type: recall_at_3
            value: 85
          - type: recall_at_5
            value: 90
      - task:
          type: Retrieval
        dataset:
          type: jinaai/xpqa
          name: MTEB XPQARetrieval (fr)
          config: fr
          split: test
          revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f
        metrics:
          - type: map_at_1
            value: 35.256
          - type: map_at_10
            value: 54.071999999999996
          - type: map_at_100
            value: 55.435
          - type: map_at_1000
            value: 55.53
          - type: map_at_20
            value: 54.855
          - type: map_at_3
            value: 48.762
          - type: map_at_5
            value: 51.949999999999996
          - type: mrr_at_1
            value: 56.34178905206942
          - type: mrr_at_10
            value: 63.30843240723078
          - type: mrr_at_100
            value: 63.92076387626982
          - type: mrr_at_1000
            value: 63.9435076251571
          - type: mrr_at_20
            value: 63.64110365119446
          - type: mrr_at_3
            value: 61.526479750778805
          - type: mrr_at_5
            value: 62.38762794837559
          - type: nauc_map_at_1000_diff1
            value: 45.88957885553053
          - type: nauc_map_at_1000_max
            value: 52.59013482565773
          - type: nauc_map_at_100_diff1
            value: 45.84948517422948
          - type: nauc_map_at_100_max
            value: 52.55839985303019
          - type: nauc_map_at_10_diff1
            value: 45.763486819482196
          - type: nauc_map_at_10_max
            value: 52.09054118600712
          - type: nauc_map_at_1_diff1
            value: 55.521911317670835
          - type: nauc_map_at_1_max
            value: 34.68779817675579
          - type: nauc_map_at_20_diff1
            value: 45.757369615751884
          - type: nauc_map_at_20_max
            value: 52.44708031434436
          - type: nauc_map_at_3_diff1
            value: 47.798733616712056
          - type: nauc_map_at_3_max
            value: 46.87976781177451
          - type: nauc_map_at_5_diff1
            value: 46.215964363315884
          - type: nauc_map_at_5_max
            value: 50.5765276342371
          - type: nauc_mrr_at_1000_diff1
            value: 55.110400510640766
          - type: nauc_mrr_at_1000_max
            value: 62.66171179919574
          - type: nauc_mrr_at_100_diff1
            value: 55.10166012000449
          - type: nauc_mrr_at_100_max
            value: 62.66269343813773
          - type: nauc_mrr_at_10_diff1
            value: 55.087629594751256
          - type: nauc_mrr_at_10_max
            value: 62.69978067726044
          - type: nauc_mrr_at_1_diff1
            value: 57.446957773325956
          - type: nauc_mrr_at_1_max
            value: 63.22109004948565
          - type: nauc_mrr_at_20_diff1
            value: 55.067208283222016
          - type: nauc_mrr_at_20_max
            value: 62.66935664582939
          - type: nauc_mrr_at_3_diff1
            value: 55.18870023658262
          - type: nauc_mrr_at_3_max
            value: 62.597473549957996
          - type: nauc_mrr_at_5_diff1
            value: 54.87651100155316
          - type: nauc_mrr_at_5_max
            value: 62.72845534030979
          - type: nauc_ndcg_at_1000_diff1
            value: 47.81162759706491
          - type: nauc_ndcg_at_1000_max
            value: 56.26337910947683
          - type: nauc_ndcg_at_100_diff1
            value: 47.119077388160676
          - type: nauc_ndcg_at_100_max
            value: 55.82354642959063
          - type: nauc_ndcg_at_10_diff1
            value: 46.784535879466496
          - type: nauc_ndcg_at_10_max
            value: 54.63437116703429
          - type: nauc_ndcg_at_1_diff1
            value: 57.446957773325956
          - type: nauc_ndcg_at_1_max
            value: 63.22109004948565
          - type: nauc_ndcg_at_20_diff1
            value: 46.756211545478905
          - type: nauc_ndcg_at_20_max
            value: 55.228917899613826
          - type: nauc_ndcg_at_3_diff1
            value: 47.66168453462149
          - type: nauc_ndcg_at_3_max
            value: 54.39836405112981
          - type: nauc_ndcg_at_5_diff1
            value: 46.97491630908418
          - type: nauc_ndcg_at_5_max
            value: 53.284362953526184
          - type: nauc_precision_at_1000_diff1
            value: -14.959536048875451
          - type: nauc_precision_at_1000_max
            value: 19.740731727610537
          - type: nauc_precision_at_100_diff1
            value: -10.329364912432421
          - type: nauc_precision_at_100_max
            value: 27.80165890502952
          - type: nauc_precision_at_10_diff1
            value: 0.7865296687777561
          - type: nauc_precision_at_10_max
            value: 38.46291415400641
          - type: nauc_precision_at_1_diff1
            value: 57.446957773325956
          - type: nauc_precision_at_1_max
            value: 63.22109004948565
          - type: nauc_precision_at_20_diff1
            value: -2.2696079664009385
          - type: nauc_precision_at_20_max
            value: 35.38696590671127
          - type: nauc_precision_at_3_diff1
            value: 14.016444043719714
          - type: nauc_precision_at_3_max
            value: 46.68119169258843
          - type: nauc_precision_at_5_diff1
            value: 6.466134759646741
          - type: nauc_precision_at_5_max
            value: 43.245171983039256
          - type: nauc_recall_at_1000_diff1
            value: 10.588340380461794
          - type: nauc_recall_at_1000_max
            value: 45.913607560926515
          - type: nauc_recall_at_100_diff1
            value: 28.995302681864565
          - type: nauc_recall_at_100_max
            value: 42.67608149089844
          - type: nauc_recall_at_10_diff1
            value: 38.958724392572854
          - type: nauc_recall_at_10_max
            value: 47.455666375173315
          - type: nauc_recall_at_1_diff1
            value: 55.521911317670835
          - type: nauc_recall_at_1_max
            value: 34.68779817675579
          - type: nauc_recall_at_20_diff1
            value: 36.623788206732016
          - type: nauc_recall_at_20_max
            value: 46.654888587980174
          - type: nauc_recall_at_3_diff1
            value: 43.46749373705754
          - type: nauc_recall_at_3_max
            value: 42.55592784672105
          - type: nauc_recall_at_5_diff1
            value: 40.49018957054939
          - type: nauc_recall_at_5_max
            value: 46.86884862874594
          - type: ndcg_at_1
            value: 56.342000000000006
          - type: ndcg_at_10
            value: 60.01800000000001
          - type: ndcg_at_100
            value: 65.182
          - type: ndcg_at_1000
            value: 66.809
          - type: ndcg_at_20
            value: 61.982000000000006
          - type: ndcg_at_3
            value: 55.688
          - type: ndcg_at_5
            value: 56.607
          - type: precision_at_1
            value: 56.342000000000006
          - type: precision_at_10
            value: 14.005
          - type: precision_at_100
            value: 1.821
          - type: precision_at_1000
            value: 0.20500000000000002
          - type: precision_at_20
            value: 7.684
          - type: precision_at_3
            value: 34.089999999999996
          - type: precision_at_5
            value: 24.005000000000003
          - type: recall_at_1
            value: 35.256
          - type: recall_at_10
            value: 67.583
          - type: recall_at_100
            value: 88.74300000000001
          - type: recall_at_1000
            value: 99.163
          - type: recall_at_20
            value: 73.87
          - type: recall_at_3
            value: 53.371
          - type: recall_at_5
            value: 59.399
license: apache-2.0

bilingual-embedding-base

This repo is a fork of the original Lajavaness/bilingual-embedding-base. The only difference is the model type name, to be compatible with text-embeddings-inference.

Bilingual-embedding is the Embedding Model for bilingual language: french and english. This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of XLM-RoBERTa, a pre-trained language model based on the XLM-RoBERTa architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Training and Fine-tuning process

Stage 1: NLI Training

  • Dataset: [(SNLI+XNLI) for english+french]
  • Method: Training using Multi-Negative Ranking Loss. This stage focused on improving the model's ability to discern and rank nuanced differences in sentence semantics.

Stage 3: Continued Fine-tuning for Semantic Textual Similarity on STS Benchmark

  • Dataset: [STSB-fr and en]
  • Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.

Stage 4: Advanced Augmentation Fine-tuning

  • Dataset: STSB with generate silver sample from gold sample
  • Method: Employed an advanced strategy using Augmented SBERT with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.

Usage:

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer

sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]

model = SentenceTransformer('Lajavaness/bilingual-embedding-base', trust_remote_code=True)
print(embeddings)

Evaluation

TODO

Citation

@article{conneau2019unsupervised,
  title={Unsupervised cross-lingual representation learning at scale},
  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
  journal={arXiv preprint arXiv:1911.02116},
  year={2019}
}

@article{reimers2019sentence,
   title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
   author={Nils Reimers, Iryna Gurevych},
   journal={https://arxiv.org/abs/1908.10084},
   year={2019}
}

@article{thakur2020augmented,
  title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
  author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
  journal={arXiv e-prints},
  pages={arXiv--2010},
  year={2020}