CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms-marco-shuffled dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-modernbert-base-msmarco-margin-mse")
# Get scores for pairs of texts
pairs = [
    ['where is joplin airport', 'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.'],
    ['where is the pd on your glasses frame', "Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd"],
    ['what year did oldsmobile stop production', 'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.'],
    ['how many sisters did barbie have', "1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2  Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2  She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3  She is a main character in the Barbie: Life in the Dreamhouse series. 4  In the series, she has been remodeled as a teenager with brown hair and a purple streak."],
    ['who discovered achondroplasia dwarfism', "For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene."],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'where is joplin airport',
    [
        'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.',
        "Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd",
        'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.',
        "1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2  Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2  She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3  She is a main character in the Barbie: Life in the Dreamhouse series. 4  In the series, she has been remodeled as a teenager with brown hair and a purple streak.",
        "For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene.",
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric NanoMSMARCO NanoNFCorpus NanoNQ
map 0.6114 (+0.1219) 0.3561 (+0.0857) 0.6775 (+0.2568)
mrr@10 0.6022 (+0.1247) 0.5900 (+0.0902) 0.6893 (+0.2626)
ndcg@10 0.6673 (+0.1269) 0.4034 (+0.0783) 0.7330 (+0.2324)

Cross Encoder Nano BEIR

Metric Value
map 0.5484 (+0.1548)
mrr@10 0.6272 (+0.1592)
ndcg@10 0.6012 (+0.1459)

Training Details

Training Dataset

ms-marco-shuffled

  • Dataset: ms-marco-shuffled at 0e80192
  • Size: 39,780,704 training samples
  • Columns: score, query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    score query positive negative
    type float string string string
    details
    • min: -4.89
    • mean: 13.57
    • max: 22.32
    • min: 12 characters
    • mean: 33.75 characters
    • max: 141 characters
    • min: 71 characters
    • mean: 349.99 characters
    • max: 1000 characters
    • min: 82 characters
    • mean: 337.52 characters
    • max: 928 characters
  • Samples:
    score query positive negative
    6.012716511885325 what body part does gases, such as oxygen and carbon dioxide, pass into or out of the blood? As blood passes through your lungs, oxygen moves into the blood while carbon dioxide moves out of the blood into the lungs. An ABG test uses blood drawn from an artery, where the oxygen and carbon dioxide levels can be measured before they enter body tissues. An ABG measures: 1 Partial pressure of oxygen (PaO2). Answers. Best Answer: The respiratory system takes in oxygen from the atmosphere and moves that oxygen into the bloodstream. The circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.Carbon dioxide diffuses from the blood into the lungs and it is then exhaled into the atmosphere.he circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.
    5.666825115680695 what does iron deficiency do Iron-deficiency anemia is the most common type of anemia. It happens when you do not have enough iron in your body. Iron deficiency is usually due to blood loss but may occasionally be due to poor absorption of iron. Pregnancy and childbirth consume a great deal of iron and thus can result in pregnancy-related anemia. color vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.iron deficiency deficiency of iron in the system, as from blood loss, low dietary iron, or a disease condition that inhibits iron uptake.See iron and iron deficiency anemia.olor vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.
    14.512734095255535 cost of tavrmasoposed to open heart surgery Several factors come into play when you’re trying to figure out how much you’re going to have to pay for an open heart surgery. The two biggest factors are what kind of open heart surgery you're having how good your insurance is. A heart transplant runs more than $700,000, significantly more than most annual salaries. Other open heart surgeries are in the neighborhood of $325,000. Much of the expense is not only the four hour long surgery, but also the testing, the anesthesia, and the medication and aftercare that are all part of the package. Foods You Can Eat After Heart Bypass. Healthy foods provide multiple benefits following heart bypass surgery. Heart bypass surgery, also called coronary bypass surgery, is performed to restore blood flow to your heart when a section of an artery in your heart is blocked.
  • Loss: MarginMSELoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity"
    }
    

Evaluation Dataset

ms-marco-shuffled

  • Dataset: ms-marco-shuffled at 0e80192
  • Size: 39,780,704 evaluation samples
  • Columns: score, query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    score query positive negative
    type float string string string
    details
    • min: -1.57
    • mean: 13.57
    • max: 22.36
    • min: 10 characters
    • mean: 34.47 characters
    • max: 109 characters
    • min: 64 characters
    • mean: 345.45 characters
    • max: 963 characters
    • min: 56 characters
    • mean: 341.89 characters
    • max: 947 characters
  • Samples:
    score query positive negative
    16.928720156351726 where is joplin airport Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals. Hoskins Airport. If you’re flying from or into Hoskins airport or simply collecting someone from their flight to Hoskins, discover all the latest information you need from Hoskins airport. Find directions, airport information and local weather for Hoskins airport and details of airlines that fly to and from Hoskins.
    15.824924786885578 where is the pd on your glasses frame Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd exists and is an alternate of . Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD. Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD.
    18.074473301569622 what year did oldsmobile stop production Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or … (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling. Cinsaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.However, Cinsaut has long been used in Apulian blends and has also begun to attract the attention of winemakers interested in reviving old varieties.insaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.
  • Loss: MarginMSELoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 8e-06
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 8e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_mean_ndcg@10
-1 -1 - - 0.0255 (-0.5150) 0.3351 (+0.0101) 0.0539 (-0.4467) 0.1382 (-0.3172)
0.0000 1 197.7525 - - - - -
0.0322 1000 189.9111 - - - - -
0.0643 2000 100.2999 - - - - -
0.0965 3000 33.4914 - - - - -
0.1286 4000 10.2638 - - - - -
0.1608 5000 7.333 6.1981 0.6326 (+0.0922) 0.4145 (+0.0894) 0.6989 (+0.1983) 0.5820 (+0.1266)
0.1930 6000 6.2212 - - - - -
0.2251 7000 5.6437 - - - - -
0.2573 8000 5.3485 - - - - -
0.2894 9000 5.0373 - - - - -
0.3216 10000 4.7753 4.3763 0.6565 (+0.1161) 0.4161 (+0.0910) 0.7294 (+0.2288) 0.6007 (+0.1453)
0.3538 11000 4.5805 - - - - -
0.3859 12000 4.4494 - - - - -
0.4181 13000 4.3038 - - - - -
0.4502 14000 4.2497 - - - - -
0.4824 15000 4.116 4.0312 0.6673 (+0.1269) 0.4034 (+0.0783) 0.7330 (+0.2324) 0.6012 (+0.1459)
0.5146 16000 4.0779 - - - - -
0.5467 17000 4.0045 - - - - -
0.5789 18000 3.8951 - - - - -
0.6111 19000 3.8733 - - - - -
0.6432 20000 3.7693 3.7577 0.6624 (+0.1220) 0.4052 (+0.0802) 0.7282 (+0.2276) 0.5986 (+0.1432)
0.6754 21000 3.794 - - - - -
0.7075 22000 3.6753 - - - - -
0.7397 23000 3.6859 - - - - -
0.7719 24000 3.6511 - - - - -
0.8040 25000 3.6294 3.6983 0.6507 (+0.1103) 0.4054 (+0.0804) 0.7291 (+0.2284) 0.5951 (+0.1397)
0.8362 26000 3.6437 - - - - -
0.8683 27000 3.549 - - - - -
0.9005 28000 3.529 - - - - -
0.9327 29000 3.535 - - - - -
0.9648 30000 3.5088 3.6602 0.6574 (+0.1170) 0.4052 (+0.0801) 0.7230 (+0.2223) 0.5952 (+0.1398)
0.9970 31000 3.472 - - - - -
-1 -1 - - 0.6673 (+0.1269) 0.4034 (+0.0783) 0.7330 (+0.2324) 0.6012 (+0.1459)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.6.0.dev20241112+cu121
  • Accelerate: 1.2.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MarginMSELoss

@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}
Downloads last month
18
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for tomaarsen/reranker-MiniLM-L12-H384-margin-mse

Finetuned
(40)
this model

Dataset used to train tomaarsen/reranker-MiniLM-L12-H384-margin-mse