Edit model card

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 512 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pankajrajdeo/182500_bioformer_8L")
# Run inference
sentences = [
    'vägtrafikolyckor',
    'accidente vial',
    'trimeresurus andersoni',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 9,358,675 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 12.84 tokens
    • max: 23 tokens
    • min: 3 tokens
    • mean: 15.45 tokens
    • max: 187 tokens
    • min: 3 tokens
    • mean: 14.75 tokens
    • max: 91 tokens
  • Samples:
    anchor positive negative
    (131)i-makroaggregerat albumin macroagrégats d'albumine humaine marquée à l'iode 131 1-acylglycerophosphorylinositol
    (131)i-makroaggregerat albumin albumin, radio-iodinated serum allo-aromadendrane-10alpha,14-diol
    (131)i-makroaggregerat albumin serum albumin, radio iodinated acquired zygomatic hyperplasia
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 820,102 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 10.54 tokens
    • max: 20 tokens
    • min: 3 tokens
    • mean: 13.21 tokens
    • max: 183 tokens
    • min: 3 tokens
    • mean: 14.98 tokens
    • max: 322 tokens
  • Samples:
    anchor positive negative
    15-ketosteryloleathydrolase steroid esterase, lipoidal glutamic acid-lysine-tyrosine terpolymer
    15-ketosteryloleathydrolase hydrolase, cholesterol ester unionicola parvipora
    15-ketosteryloleathydrolase acylhydrolase, sterol ester mayamaea fossalis var. fossalis
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0137 1000 2.6865 -
0.0274 2000 1.4053 -
0.0410 3000 0.9222 -
0.0547 4000 0.7162 -
0.0684 5000 0.6036 -
0.0821 6000 0.5245 -
0.0957 7000 0.4665 -
0.1094 8000 0.4215 -
0.1231 9000 0.3931 -
0.1368 10000 0.3661 -
0.1504 11000 0.348 -
0.1641 12000 0.3241 -
0.1778 13000 0.3108 -
0.1915 14000 0.2943 -
0.2052 15000 0.2817 -
0.2188 16000 0.2653 -
0.2325 17000 0.2562 -
0.2462 18000 0.2529 -
0.2599 19000 0.2438 -
0.2735 20000 0.2359 -
0.2872 21000 0.2237 -
0.3009 22000 0.2207 -
0.3146 23000 0.2143 -
0.3283 24000 0.2141 -
0.3419 25000 0.2024 -
0.3556 26000 0.196 -
0.3693 27000 0.1951 -
0.3830 28000 0.19 -
0.3966 29000 0.1864 -
0.4103 30000 0.1866 -
0.4240 31000 0.1797 -
0.4377 32000 0.1805 -
0.4513 33000 0.1681 -
0.4650 34000 0.1712 -
0.4787 35000 0.1698 -
0.4924 36000 0.1619 -
0.4992 36500 - 0.1407
0.5061 37000 0.1652 -
0.5197 38000 0.1622 -
0.5334 39000 0.1603 -
0.5471 40000 0.1518 -
0.5608 41000 0.1488 -
0.5744 42000 0.1531 -
0.5881 43000 0.1472 -
0.6018 44000 0.1454 -
0.6155 45000 0.1473 -
0.6291 46000 0.1411 -
0.6428 47000 0.1389 -
0.6565 48000 0.1375 -
0.6702 49000 0.1393 -
0.6839 50000 0.1366 -
0.6975 51000 0.134 -
0.7112 52000 0.1331 -
0.7249 53000 0.1323 -
0.7386 54000 0.1309 -
0.7522 55000 0.1254 -
0.7659 56000 0.1298 -
0.7796 57000 0.1244 -
0.7933 58000 0.1254 -
0.8069 59000 0.1205 -
0.8206 60000 0.1213 -
0.8343 61000 0.1226 -
0.8480 62000 0.1187 -
0.8617 63000 0.1158 -
0.8753 64000 0.1171 -
0.8890 65000 0.1137 -
0.9027 66000 0.1172 -
0.9164 67000 0.1169 -
0.9300 68000 0.1137 -
0.9437 69000 0.1145 -
0.9574 70000 0.1127 -
0.9711 71000 0.1126 -
0.9848 72000 0.1126 -
0.9984 73000 0.1078 0.0997
1.0121 74000 0.0999 -
1.0258 75000 0.1001 -
1.0395 76000 0.0962 -
1.0531 77000 0.0984 -
1.0668 78000 0.0982 -
1.0805 79000 0.098 -
1.0942 80000 0.0964 -
1.1078 81000 0.0964 -
1.1215 82000 0.0949 -
1.1352 83000 0.0929 -
1.1489 84000 0.0914 -
1.1626 85000 0.0918 -
1.1762 86000 0.0916 -
1.1899 87000 0.0891 -
1.2036 88000 0.0921 -
1.2173 89000 0.0925 -
1.2309 90000 0.091 -
1.2446 91000 0.0875 -
1.2583 92000 0.0898 -
1.2720 93000 0.0856 -
1.2856 94000 0.0866 -
1.2993 95000 0.0843 -
1.3130 96000 0.0848 -
1.3267 97000 0.0872 -
1.3404 98000 0.0853 -
1.3540 99000 0.0898 -
1.3677 100000 0.0831 -
1.3814 101000 0.0819 -
1.3951 102000 0.0842 -
1.4087 103000 0.083 -
1.4224 104000 0.0824 -
1.4361 105000 0.0802 -
1.4498 106000 0.0834 -
1.4634 107000 0.0833 -
1.4771 108000 0.0815 -
1.4908 109000 0.079 -
1.4976 109500 - 0.0820
1.5045 110000 0.0809 -
1.5182 111000 0.0784 -
1.5318 112000 0.0767 -
1.5455 113000 0.0782 -
1.5592 114000 0.0799 -
1.5729 115000 0.0787 -
1.5865 116000 0.0798 -
1.6002 117000 0.0821 -
1.6139 118000 0.0771 -
1.6276 119000 0.0758 -
1.6413 120000 0.0789 -
1.6549 121000 0.0777 -
1.6686 122000 0.0755 -
1.6823 123000 0.0774 -
1.6960 124000 0.0748 -
1.7096 125000 0.077 -
1.7233 126000 0.0755 -
1.7370 127000 0.0749 -
1.7507 128000 0.0718 -
1.7643 129000 0.0753 -
1.7780 130000 0.0728 -
1.7917 131000 0.0704 -
1.8054 132000 0.0719 -
1.8191 133000 0.0711 -
1.8327 134000 0.0713 -
1.8464 135000 0.0695 -
1.8601 136000 0.0716 -
1.8738 137000 0.0691 -
1.8874 138000 0.0692 -
1.9011 139000 0.0744 -
1.9148 140000 0.0726 -
1.9285 141000 0.0682 -
1.9421 142000 0.0695 -
1.9558 143000 0.0723 -
1.9695 144000 0.0711 -
1.9832 145000 0.0692 -
1.9969 146000 0.0694 0.0704
2.0105 147000 0.0572 -
2.0242 148000 0.0545 -
2.0379 149000 0.0549 -
2.0516 150000 0.0552 -
2.0652 151000 0.0551 -
2.0789 152000 0.0559 -
2.0926 153000 0.0582 -
2.1063 154000 0.0587 -
2.1199 155000 0.0529 -
2.1336 156000 0.059 -
2.1473 157000 0.0534 -
2.1610 158000 0.0547 -
2.1747 159000 0.0543 -
2.1883 160000 0.0558 -
2.2020 161000 0.0548 -
2.2157 162000 0.0534 -
2.2294 163000 0.0548 -
2.2430 164000 0.0546 -
2.2567 165000 0.053 -
2.2704 166000 0.0557 -
2.2841 167000 0.0541 -
2.2978 168000 0.0527 -
2.3114 169000 0.0542 -
2.3251 170000 0.0529 -
2.3388 171000 0.0554 -
2.3525 172000 0.054 -
2.3661 173000 0.0506 -
2.3798 174000 0.054 -
2.3935 175000 0.0525 -
2.4072 176000 0.0542 -
2.4208 177000 0.0546 -
2.4345 178000 0.0516 -
2.4482 179000 0.053 -
2.4619 180000 0.0542 -
2.4756 181000 0.0538 -
2.4892 182000 0.0536 -
2.4961 182500 - 0.0655

Framework Versions

  • Python: 3.9.16
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 1.0.0
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
4
Safetensors
Model size
42.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.