SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'log.alarmFault.waveringLightEmission',
    'log.presetBrightnessPoint',
    'log.maximumWattageBoundary',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 70,000 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 10.81 tokens
    • max: 18 tokens
    • min: 5 tokens
    • mean: 10.1 tokens
    • max: 17 tokens
    • min: -0.0
    • mean: 0.11
    • max: 0.99
  • Samples:
    sentence1 sentence2 score
    log.temperatureMaximumLimit schedule.daysWhenScheduleIsEffective 0.006032609194517136
    device.DeviceTimeZone maintenance.maintenanceModifications 0.011996420472860337
    log.alarmFault.highAmps log.currentLowerBoundary 0.20761280847788094
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 70,000 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 10.81 tokens
    • max: 18 tokens
    • min: 5 tokens
    • mean: 10.1 tokens
    • max: 17 tokens
    • min: -0.0
    • mean: 0.11
    • max: 0.99
  • Samples:
    sentence1 sentence2 score
    log.temperatureMaximumLimit schedule.daysWhenScheduleIsEffective 0.006032609194517136
    device.DeviceTimeZone maintenance.maintenanceModifications 0.011996420472860337
    log.alarmFault.highAmps log.currentLowerBoundary 0.20761280847788094
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.2286 1000 4.9688 4.1188
0.4571 2000 4.0956 3.9955
0.6857 3000 4.0295 3.8972
0.9143 4000 3.9616 3.8387
1.1429 5000 3.9073 3.7972
1.3714 6000 3.8188 3.7559
1.6 7000 3.7536 3.5798
1.8286 8000 3.6843 3.6076
2.0571 9000 3.6231 3.5363
2.2857 10000 3.5492 3.4779
2.5143 11000 3.5423 3.4188
2.7429 12000 3.4868 3.4221
2.9714 13000 3.4593 3.2962
3.2 14000 3.3957 3.3086
3.4286 15000 3.3801 3.2652
3.6571 16000 3.3501 3.2527
3.8857 17000 3.3117 3.2055
4.1143 18000 3.2396 3.1950
4.3429 19000 3.2424 3.1900
4.5714 20000 3.2185 3.1467
4.8 21000 3.2173 3.1315
5.0286 22000 3.2119 3.1175
5.2571 23000 3.1583 3.0700
5.4857 24000 3.1634 3.0862
5.7143 25000 3.1538 3.0367
5.9429 26000 3.1187 3.0292
6.1712 27000 3.0703 3.0349
6.3998 28000 3.0925 3.0017
6.6283 29000 3.0179 2.9847
6.8569 30000 3.0331 2.9622
7.0855 31000 3.0784 2.9761
7.3141 32000 3.0484 2.9501
7.5426 33000 3.0138 2.9397
7.7712 34000 2.9935 2.9322
7.9998 35000 2.9912 2.9247
8.2283 36000 2.9852 2.9069
8.4569 37000 2.946 2.9162
8.6855 38000 2.9503 2.9038
8.9141 39000 2.9759 2.8972
9.1426 40000 2.9413 2.8893
9.3712 41000 2.933 2.8878
9.5998 42000 2.918 2.8747
9.8283 43000 2.9427 2.8708

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.0
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.0.1
  • Datasets: 3.0.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
6
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for SQAI/parent-column-model

Finetuned
(230)
this model