---
base_model: aubmindlab/bert-base-arabertv02
datasets:
- akhooli/arabic-triplets-1m-curated-sims-len
language:
- ar
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- transformers.js
- transformers
- sentence-similarity
- feature-extraction
- dataset_size:75000
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
- mteb
model-index:
- name: Omartificial-Intelligence-Space/Arabert-matro-v4
  results:
  - dataset:
      config: ar
      name: MTEB MIRACLRetrievalHardNegatives (ar)
      revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb
      split: dev
      type: mteb/miracl-hard-negatives
    metrics:
    - type: main_score
      value: 62.616
    task:
      type: Retrieval
  - dataset:
      config: ara-ara
      name: MTEB MLQARetrieval (ara-ara)
      revision: 397ed406c1a7902140303e7faf60fff35b58d285
      split: test
      type: facebook/mlqa
    metrics:
    - type: main_score
      value: 67.56
    task:
      type: Retrieval
  - dataset:
      config: ar
      name: MTEB MintakaRetrieval (ar)
      revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
      split: test
      type: jinaai/mintakaqa
    metrics:
    - type: main_score
      value: 20.059
    task:
      type: Retrieval
  - dataset:
      config: default
      name: MTEB SadeemQuestionRetrieval (default)
      revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9
      split: test
      type: sadeem-ai/sadeem-ar-eval-retrieval-questions
    metrics:
    - type: main_score
      value: 64.662
    task:
      type: Retrieval
  - dataset:
      config: ar-ar
      name: MTEB STS17 (ar-ar)
      revision: faeb762787bd10488a50c8b5be4a3b82e411949c
      split: test
      type: mteb/sts17-crosslingual-sts
    metrics:
    - type: cosine_pearson
      value: 84.66883392015258
    - type: cosine_spearman
      value: 85.30520907141938
    - type: euclidean_pearson
      value: 82.04306779342852
    - type: euclidean_spearman
      value: 84.58744201847996
    - type: main_score
      value: 85.30520907141938
    - type: manhattan_pearson
      value: 82.08829357724328
    - type: manhattan_spearman
      value: 84.49254541383544
    task:
      type: STS
license: apache-2.0
---

# Arabic-Triplet-Matryoshka-V2-Model

- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02). 

- It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, 
text classification, clustering, and more.


- This model is trained on 1M samples from the [akhooli/arabic-triplets-1m-curated-sims-len](https://huggingface.co/datasets/akhooli/arabic-triplets-1m-curated-sims-len) dataset.

 
- Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss).


```markdown
## Citation

If you use the Arabic Matryoshka Embeddings Model, please cite it as follows:

@misc{nacar2024enhancingsemanticsimilarityunderstanding,
      title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning}, 
      author={Omer Nacar and Anis Koubaa},
      year={2024},
      eprint={2407.21139},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.21139}, 
}