CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms-marco-shuffled dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: microsoft/MiniLM-L12-H384-uncased
- Maximum Sequence Length: 512 tokens
- Number of Output Labels: 1 label
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-modernbert-base-msmarco-margin-mse")
# Get scores for pairs of texts
pairs = [
['where is joplin airport', 'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.'],
['where is the pd on your glasses frame', "Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd"],
['what year did oldsmobile stop production', 'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.'],
['how many sisters did barbie have', "1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2 Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2 She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3 She is a main character in the Barbie: Life in the Dreamhouse series. 4 In the series, she has been remodeled as a teenager with brown hair and a purple streak."],
['who discovered achondroplasia dwarfism', "For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene."],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'where is joplin airport',
[
'Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.',
"Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd",
'Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or â\x80¦ (General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.',
"1 Kelly/Chelsea Roberts (1995-2009â\x80\x93present) This character is of toddler age, and is a sister to Barbie, Skipper, and Stacie. 2 Originally the baby of the family (replaced by her younger sister Krissy Roberts in 1999), she also has three older sisters: Barbie, Skipper, and Stacie. Skipper is Barbie's younger sister. 2 She was first introduced with blue eyes and a variety of hair colors like blonde and brown. 3 She is a main character in the Barbie: Life in the Dreamhouse series. 4 In the series, she has been remodeled as a teenager with brown hair and a purple streak.",
"For several years, Dr. Wasmuth and his team had suspected that the gene, FGFR3, was responsible for a defect that causes Huntington's disease, a neurological disorder. But they found no link. They took another look after other researchers suggested that the same chromosome region might harbor the achondroplasia gene.",
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Datasets:
NanoMSMARCO
,NanoNFCorpus
andNanoNQ
- Evaluated with
CERerankingEvaluator
Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
---|---|---|---|
map | 0.6114 (+0.1219) | 0.3561 (+0.0857) | 0.6775 (+0.2568) |
mrr@10 | 0.6022 (+0.1247) | 0.5900 (+0.0902) | 0.6893 (+0.2626) |
ndcg@10 | 0.6673 (+0.1269) | 0.4034 (+0.0783) | 0.7330 (+0.2324) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_mean
- Evaluated with
CENanoBEIREvaluator
Metric | Value |
---|---|
map | 0.5484 (+0.1548) |
mrr@10 | 0.6272 (+0.1592) |
ndcg@10 | 0.6012 (+0.1459) |
Training Details
Training Dataset
ms-marco-shuffled
- Dataset: ms-marco-shuffled at 0e80192
- Size: 39,780,704 training samples
- Columns:
score
,query
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
score query positive negative type float string string string details - min: -4.89
- mean: 13.57
- max: 22.32
- min: 12 characters
- mean: 33.75 characters
- max: 141 characters
- min: 71 characters
- mean: 349.99 characters
- max: 1000 characters
- min: 82 characters
- mean: 337.52 characters
- max: 928 characters
- Samples:
score query positive negative 6.012716511885325
what body part does gases, such as oxygen and carbon dioxide, pass into or out of the blood?
As blood passes through your lungs, oxygen moves into the blood while carbon dioxide moves out of the blood into the lungs. An ABG test uses blood drawn from an artery, where the oxygen and carbon dioxide levels can be measured before they enter body tissues. An ABG measures: 1 Partial pressure of oxygen (PaO2).
Answers. Best Answer: The respiratory system takes in oxygen from the atmosphere and moves that oxygen into the bloodstream. The circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.Carbon dioxide diffuses from the blood into the lungs and it is then exhaled into the atmosphere.he circulatory system then carries the oxygen to all the cells in the body and picks up carbon dioxide waste which it returns to the lungs.
5.666825115680695
what does iron deficiency do
Iron-deficiency anemia is the most common type of anemia. It happens when you do not have enough iron in your body. Iron deficiency is usually due to blood loss but may occasionally be due to poor absorption of iron. Pregnancy and childbirth consume a great deal of iron and thus can result in pregnancy-related anemia.
color vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.iron deficiency deficiency of iron in the system, as from blood loss, low dietary iron, or a disease condition that inhibits iron uptake.See iron and iron deficiency anemia.olor vision deficiency see color vision deficiency. deficiency disease a condition due to dietary or metabolic deficiency, including all diseases caused by an insufficient supply of essential nutrients.
14.512734095255535
cost of tavrmasoposed to open heart surgery
Several factors come into play when youâre trying to figure out how much youâre going to have to pay for an open heart surgery. The two biggest factors are what kind of open heart surgery you're having how good your insurance is. A heart transplant runs more than $700,000, significantly more than most annual salaries. Other open heart surgeries are in the neighborhood of $325,000. Much of the expense is not only the four hour long surgery, but also the testing, the anesthesia, and the medication and aftercare that are all part of the package.
Foods You Can Eat After Heart Bypass. Healthy foods provide multiple benefits following heart bypass surgery. Heart bypass surgery, also called coronary bypass surgery, is performed to restore blood flow to your heart when a section of an artery in your heart is blocked.
- Loss:
MarginMSELoss
with these parameters:{ "activation_fct": "torch.nn.modules.linear.Identity" }
Evaluation Dataset
ms-marco-shuffled
- Dataset: ms-marco-shuffled at 0e80192
- Size: 39,780,704 evaluation samples
- Columns:
score
,query
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
score query positive negative type float string string string details - min: -1.57
- mean: 13.57
- max: 22.36
- min: 10 characters
- mean: 34.47 characters
- max: 109 characters
- min: 64 characters
- mean: 345.45 characters
- max: 963 characters
- min: 56 characters
- mean: 341.89 characters
- max: 947 characters
- Samples:
score query positive negative 16.928720156351726
where is joplin airport
Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.
Hoskins Airport. If youâre flying from or into Hoskins airport or simply collecting someone from their flight to Hoskins, discover all the latest information you need from Hoskins airport. Find directions, airport information and local weather for Hoskins airport and details of airlines that fly to and from Hoskins.
15.824924786885578
where is the pd on your glasses frame
Pupillary Distance (PD) You'll need to know your PD if you want to order glasses from EyeBuyDirect. Don't worry if your glasses prescription doesn't include your PD, we can show you how to measure it by yourself. How to measure your pd
exists and is an alternate of . Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD. Mahwah PD in NJ makes 121k after 6 years, Bergenfield PD makes 117k after 5 years and there are endless PD'S that smash the base pay of SCPD.
18.074473301569622
what year did oldsmobile stop production
Oldsmobile was not the problem, it was GM that made oldmobiles but they stopped making them in 2004 and the reason is that Oldsmobiles did not bring in enough money for GM or ⦠(General Motors) to be happy so they stopped. but if you ask me i think any car that lasted 106 year is good enough and is a good car to keep selling.
Cinsaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.However, Cinsaut has long been used in Apulian blends and has also begun to attract the attention of winemakers interested in reviving old varieties.insaut vines. Known as Ottavianello, there is one tiny DOC devoted to Cinsaut-Ostuni Ottavianello, with a total production of less than 1000 cases a year.
- Loss:
MarginMSELoss
with these parameters:{ "activation_fct": "torch.nn.modules.linear.Identity" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 64per_device_eval_batch_size
: 64learning_rate
: 8e-06num_train_epochs
: 1warmup_ratio
: 0.1seed
: 12bf16
: Truedataloader_num_workers
: 4load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 64per_device_eval_batch_size
: 64per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 8e-06weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 12data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 4dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
---|---|---|---|---|---|---|---|
-1 | -1 | - | - | 0.0255 (-0.5150) | 0.3351 (+0.0101) | 0.0539 (-0.4467) | 0.1382 (-0.3172) |
0.0000 | 1 | 197.7525 | - | - | - | - | - |
0.0322 | 1000 | 189.9111 | - | - | - | - | - |
0.0643 | 2000 | 100.2999 | - | - | - | - | - |
0.0965 | 3000 | 33.4914 | - | - | - | - | - |
0.1286 | 4000 | 10.2638 | - | - | - | - | - |
0.1608 | 5000 | 7.333 | 6.1981 | 0.6326 (+0.0922) | 0.4145 (+0.0894) | 0.6989 (+0.1983) | 0.5820 (+0.1266) |
0.1930 | 6000 | 6.2212 | - | - | - | - | - |
0.2251 | 7000 | 5.6437 | - | - | - | - | - |
0.2573 | 8000 | 5.3485 | - | - | - | - | - |
0.2894 | 9000 | 5.0373 | - | - | - | - | - |
0.3216 | 10000 | 4.7753 | 4.3763 | 0.6565 (+0.1161) | 0.4161 (+0.0910) | 0.7294 (+0.2288) | 0.6007 (+0.1453) |
0.3538 | 11000 | 4.5805 | - | - | - | - | - |
0.3859 | 12000 | 4.4494 | - | - | - | - | - |
0.4181 | 13000 | 4.3038 | - | - | - | - | - |
0.4502 | 14000 | 4.2497 | - | - | - | - | - |
0.4824 | 15000 | 4.116 | 4.0312 | 0.6673 (+0.1269) | 0.4034 (+0.0783) | 0.7330 (+0.2324) | 0.6012 (+0.1459) |
0.5146 | 16000 | 4.0779 | - | - | - | - | - |
0.5467 | 17000 | 4.0045 | - | - | - | - | - |
0.5789 | 18000 | 3.8951 | - | - | - | - | - |
0.6111 | 19000 | 3.8733 | - | - | - | - | - |
0.6432 | 20000 | 3.7693 | 3.7577 | 0.6624 (+0.1220) | 0.4052 (+0.0802) | 0.7282 (+0.2276) | 0.5986 (+0.1432) |
0.6754 | 21000 | 3.794 | - | - | - | - | - |
0.7075 | 22000 | 3.6753 | - | - | - | - | - |
0.7397 | 23000 | 3.6859 | - | - | - | - | - |
0.7719 | 24000 | 3.6511 | - | - | - | - | - |
0.8040 | 25000 | 3.6294 | 3.6983 | 0.6507 (+0.1103) | 0.4054 (+0.0804) | 0.7291 (+0.2284) | 0.5951 (+0.1397) |
0.8362 | 26000 | 3.6437 | - | - | - | - | - |
0.8683 | 27000 | 3.549 | - | - | - | - | - |
0.9005 | 28000 | 3.529 | - | - | - | - | - |
0.9327 | 29000 | 3.535 | - | - | - | - | - |
0.9648 | 30000 | 3.5088 | 3.6602 | 0.6574 (+0.1170) | 0.4052 (+0.0801) | 0.7230 (+0.2223) | 0.5952 (+0.1398) |
0.9970 | 31000 | 3.472 | - | - | - | - | - |
-1 | -1 | - | - | 0.6673 (+0.1269) | 0.4034 (+0.0783) | 0.7330 (+0.2324) | 0.6012 (+0.1459) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.10
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.49.0.dev0
- PyTorch: 2.6.0.dev20241112+cu121
- Accelerate: 1.2.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MarginMSELoss
@misc{hofstätter2021improving,
title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
year={2021},
eprint={2010.02666},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
- Downloads last month
- 18
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-classification models for sentence-transformers library.
Model tree for tomaarsen/reranker-MiniLM-L12-H384-margin-mse
Base model
microsoft/MiniLM-L12-H384-uncased