netapy's picture
Update README.md
3ec711d
|
raw
history blame
1.38 kB
metadata
pipeline_tag: sentence-similarity
tags:
  - feature-extraction
  - sentence-similarity
license: mit
language:
  - fr
  - en

Solon Embeddings — base 0.1

SOTA Open source french embedding model.

Model Mean Score
cohere/embed-multilingual-v3 0.7402
OrdalieTech/Solon-embeddings-base-0.1 0.7306
openai/ada-002 0.7290
cohere/embed-multilingual-light-v3 0.6945
antoinelouis/biencoder-camembert-base-mmarcoFR 0.6826
dangvantuan/sentence-camembert-large 0.6756
voyage/voyage-01 0.6753
intfloat/multilingual-e5-large 0.6660
intfloat/multilingual-e5-base 0.6597
Sbert/paraphrase-multilingual-mpnet-base-v2 0.5975
dangvantuan/sentence-camembert-base 0.5456
EuropeanParliament/eubert_embedding_v1 0.5063

These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :

  • AmazonReviewsClassification
  • MassiveIntentClassification
  • MassiveScenarioClassification
  • MTOPDomainClassification
  • MTOPIntentClassification
  • STS22
  • MiraclFRRerank
  • OrdalieFRSTS
  • OrdalieFRReranking

We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.

(evaluation script currently available here : github.com/netapy/mteb)


(Large version comming soon...)