--- base_model: Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka datasets: - Omartificial-Intelligence-Space/Arabic-stsb language: - ar library_name: sentence-transformers metrics: - pearson_cosine - spearman_cosine - pearson_manhattan - spearman_manhattan - pearson_euclidean - spearman_euclidean - pearson_dot - spearman_dot - pearson_max - spearman_max pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:947818 - loss:SoftmaxLoss - loss:CosineSimilarityLoss widget: - source_sentence: امرأة تكتب شيئاً sentences: - مراهق يتحدث إلى فتاة عبر كاميرا الإنترنت - امرأة تقطع البصل الأخضر. - مجموعة من كبار السن يتظاهرون حول طاولة الطعام. - source_sentence: تتشكل النجوم في مناطق تكوين النجوم، والتي تنشأ نفسها من السحب الجزيئية. sentences: - لاعب كرة السلة على وشك تسجيل نقاط لفريقه. - المقال التالي مأخوذ من نسختي من "أطلس البطريق الجديد للتاريخ الوسطى" - قد يكون من الممكن أن يوجد نظام شمسي مثل نظامنا خارج المجرة - source_sentence: تحت السماء الزرقاء مع الغيوم البيضاء، يصل طفل لمس مروحة طائرة واقفة على حقل من العشب. sentences: - امرأة تحمل كأساً - طفل يحاول لمس مروحة طائرة - اثنان من عازبين عن الشرب يستعدون للعشاء - source_sentence: رجل في منتصف العمر يحلق لحيته في غرفة ذات جدران بيضاء والتي لا تبدو كحمام sentences: - فتى يخطط اسمه على مكتبه - رجل ينام - المرأة وحدها وهي نائمة في غرفة نومها - source_sentence: الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة. sentences: - شخص طويل القامة - المرأة تنظر من النافذة. - لقد مات الكلب model-index: - name: SentenceTransformer based on Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka results: - dataset: config: ar name: MTEB MIRACLRetrievalHardNegatives (ar) revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb split: dev type: mteb/miracl-hard-negatives metrics: - type: main_score value: 17.751 task: type: Retrieval - dataset: config: ara-ara name: MTEB MLQARetrieval (ara-ara) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 58.026 task: type: Retrieval - dataset: config: ar name: MTEB MintakaRetrieval (ar) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: main_score value: 17.121 task: type: Retrieval - dataset: config: default name: MTEB SadeemQuestionRetrieval (default) revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9 split: test type: sadeem-ai/sadeem-ar-eval-retrieval-questions metrics: - type: main_score value: 59.306 task: type: Retrieval - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev type: sts-dev metrics: - type: pearson_cosine value: 0.8383581637565862 name: Pearson Cosine - type: spearman_cosine value: 0.8389373148442993 name: Spearman Cosine - type: pearson_manhattan value: 0.8247947413553784 name: Pearson Manhattan - type: spearman_manhattan value: 0.8329104956151686 name: Spearman Manhattan - type: pearson_euclidean value: 0.8249963167509389 name: Pearson Euclidean - type: spearman_euclidean value: 0.8336591462431132 name: Spearman Euclidean - type: pearson_dot value: 0.8071855574990106 name: Pearson Dot - type: spearman_dot value: 0.8097706351791779 name: Spearman Dot - type: pearson_max value: 0.8383581637565862 name: Pearson Max - type: spearman_max value: 0.8389373148442993 name: Spearman Max - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test type: sts-test metrics: - type: pearson_cosine value: 0.7907507025363603 name: Pearson Cosine - type: spearman_cosine value: 0.7893080660475024 name: Spearman Cosine - type: pearson_manhattan value: 0.7923222026451455 name: Pearson Manhattan - type: spearman_manhattan value: 0.7946838339078852 name: Spearman Manhattan - type: pearson_euclidean value: 0.7903690631114766 name: Pearson Euclidean - type: spearman_euclidean value: 0.793426368251902 name: Spearman Euclidean - type: pearson_dot value: 0.7404285389360442 name: Pearson Dot - type: spearman_dot value: 0.7353599094850335 name: Spearman Dot - type: pearson_max value: 0.7923222026451455 name: Pearson Max - type: spearman_max value: 0.7946838339078852 name: Spearman Max --- # GATE-AraBert-v0 This is a General Arabic Text Embedding trained using SentenceTransformers in a multi-task setup. The system trains on the AllNLI and on the STS dataset. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2](https://huggingface.co/Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Training Datasets:** - [all-nli](https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class) - [sts](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb) - **Language:** ar ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("Omartificial-Intelligence-Space/GATE-AraBert-v0") # Run inference sentences = [ 'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.', 'لقد مات الكلب', 'شخص طويل القامة', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Dataset: `sts-dev` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8384 | | **spearman_cosine** | **0.8389** | | pearson_manhattan | 0.8248 | | spearman_manhattan | 0.8329 | | pearson_euclidean | 0.825 | | spearman_euclidean | 0.8337 | | pearson_dot | 0.8072 | | spearman_dot | 0.8098 | | pearson_max | 0.8384 | | spearman_max | 0.8389 | #### Semantic Similarity * Dataset: `sts-test` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.7908 | | **spearman_cosine** | **0.7893** | | pearson_manhattan | 0.7923 | | spearman_manhattan | 0.7947 | | pearson_euclidean | 0.7904 | | spearman_euclidean | 0.7934 | | pearson_dot | 0.7404 | | spearman_dot | 0.7354 | | pearson_max | 0.7923 | | spearman_max | 0.7947 |