SentenceTransformer based on abdoelsayed/AraDPR

This is a sentence-transformers model finetuned from abdoelsayed/AraDPR. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: abdoelsayed/AraDPR
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hatemestinbejaia/KDAraDPR2_initialversion0")
# Run inference
sentences = [
    'تحديد المسح',
    'المسح أو مسح الأراضي هو تقنية ومهنة وعلم تحديد المواقع الأرضية أو ثلاثية الأبعاد للنقاط والمسافات والزوايا بينها . يطلق على أخصائي مسح الأراضي اسم مساح الأراضي .',
    'إجمالي المحطات . تعد المحطات الإجمالية واحدة من أكثر أدوات المسح شيوعا المستخدمة اليوم . وهي تتألف من جهاز ثيودوليت إلكتروني ومكون إلكتروني لقياس المسافة ( EDM ) . تتوفر أيضا محطات روبوتية كاملة تتيح التشغيل لشخص واحد من خلال التحكم في الجهاز باستخدام جهاز التحكم عن بعد . تاريخ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Reranking

Metric Value
map 0.547
mrr@10 0.5489
ndcg@10 0.6231

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • gradient_accumulation_steps: 8
  • learning_rate: 7e-05
  • warmup_ratio: 0.07
  • fp16: True
  • half_precision_backend: amp
  • load_best_model_at_end: True
  • fp16_backend: amp

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 7e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.07
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: amp
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: amp
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss map
0.0512 2000 0.0019 0.0045 0.4548
0.1024 4000 0.0011 0.0039 0.4988
0.1536 6000 0.001 0.0034 0.4871
0.2048 8000 0.0009 0.0032 0.4811
0.256 10000 0.0009 0.0032 0.4641
0.3072 12000 0.0008 0.0028 0.4540
0.3584 14000 0.0007 0.0027 0.4918
0.4096 16000 0.0007 0.0024 0.5039
0.4608 18000 0.0006 0.0024 0.5051
0.512 20000 0.0006 0.0021 0.4772
0.5632 22000 0.0006 0.0021 0.5110
0.6144 24000 0.0005 0.0020 0.5286
0.6656 26000 0.0005 0.0020 0.5217
0.7168 28000 0.0005 0.0018 0.5193
0.768 30000 0.0005 0.0018 0.5152
0.8192 32000 0.0005 0.0017 0.5322
0.8704 34000 0.0004 0.0016 0.5296
0.9216 36000 0.0004 0.0016 0.5266
0.9728 38000 0.0004 0.0015 0.5244
1.024 40000 0.0004 0.0014 0.5251
1.0752 42000 0.0003 0.0014 0.5202
1.1264 44000 0.0003 0.0014 0.5089
1.1776 46000 0.0003 0.0013 0.5030
1.2288 48000 0.0003 0.0013 0.5184
1.28 50000 0.0003 0.0012 0.5267
1.3312 52000 0.0003 0.0012 0.5386
1.3824 54000 0.0003 0.0012 0.5254
1.4336 56000 0.0003 0.0012 0.5378
1.4848 58000 0.0003 0.0011 0.5324
1.536 60000 0.0003 0.0011 0.5364
1.5872 62000 0.0003 0.0011 0.5412
1.6384 64000 0.0003 0.0010 0.5339
1.6896 66000 0.0003 0.0010 0.5452
1.7408 68000 0.0003 0.0010 0.5557
1.792 70000 0.0002 0.001 0.5619
1.8432 72000 0.0002 0.0010 0.5512
1.8944 74000 0.0002 0.0010 0.5434
1.9456 76000 0.0002 0.0009 0.5367
1.9968 78000 0.0002 0.0009 0.5497
2.048 80000 0.0002 0.0009 0.5459
2.0992 82000 0.0002 0.0009 0.5616
2.1504 84000 0.0002 0.0009 0.5573
2.2016 86000 0.0002 0.0009 0.5526
2.2528 88000 0.0002 0.0008 0.5557
2.304 90000 0.0002 0.0008 0.5470
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 1.2.0
  • Datasets: 3.0.1
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MarginMSELoss

@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}
Downloads last month
26
Safetensors
Model size
178M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hatemestinbejaia/mmarco-Arabic-AraDPR-bi-encoder-KD-v1

Base model

abdoelsayed/AraDPR
Finetuned
(2)
this model

Evaluation results