Edit model card

SentenceTransformer based on thenlper/gte-base

This is a sentence-transformers model finetuned from thenlper/gte-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("neel2306/RE-cp-costgen")
# Run inference
sentences = [
    'Lubricating And Similar Oils Not From Petroleum Refineries',
    'Synthetic lubricants',
    'Crude oil',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,439 training samples
  • Columns: anchor, positives, and negatives
  • Approximate statistics based on the first 1000 samples:
    anchor positives negatives
    type string string string
    details
    • min: 3 tokens
    • mean: 9.72 tokens
    • max: 34 tokens
    • min: 3 tokens
    • mean: 5.96 tokens
    • max: 15 tokens
    • min: 3 tokens
    • mean: 5.0 tokens
    • max: 11 tokens
  • Samples:
    anchor positives negatives
    Other Metal Valve and Pipe Fitting Manufacturing Pipe fittings Rubber gaskets
    Fluid Power Pump and Motor Manufacturing: Miscellaneous Receipts Pneumatic motors Gear pumps
    Maintenance and Repair for Commercial Machinery Labor costs for maintenance technicians Office supplies for administrative tasks
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 480 evaluation samples
  • Columns: anchor, positives, and negatives
  • Approximate statistics based on the first 480 samples:
    anchor positives negatives
    type string string string
    details
    • min: 3 tokens
    • mean: 10.4 tokens
    • max: 34 tokens
    • min: 3 tokens
    • mean: 5.97 tokens
    • max: 14 tokens
    • min: 3 tokens
    • mean: 5.09 tokens
    • max: 14 tokens
  • Samples:
    anchor positives negatives
    Other Metal Ore Mining Aluminum ore processing Metal alloy production
    Bituminous Coal And Lignite Surface Mining: Processed Bituminous Coal And Lignite From Surface Operations Processed Bituminous Coal Anthracite Coal
    Roofing Contractors Labor costs for roofing installation Foundation construction costs
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.1389 50 0.955 0.8155
0.2778 100 0.8643 0.6782
0.4167 150 0.6977 0.5452
0.5556 200 0.5738 0.4514
0.6944 250 0.3365 0.5229
0.8333 300 0.3888 0.4742
0.9722 350 0.4754 0.3900
1.1111 400 0.4109 0.4337
1.25 450 0.3081 0.3950
1.3889 500 0.3282 0.3345
1.5278 550 0.2371 0.3538
1.6667 600 0.1282 0.4055
1.8056 650 0.1091 0.5044
1.9444 700 0.2137 0.4423
2.0833 750 0.1169 0.4840
2.2222 800 0.1076 0.4867
2.3611 850 0.1669 0.4859
2.5 900 0.074 0.4873
2.6389 950 0.0519 0.4409
2.7778 1000 0.0257 0.4604
2.9167 1050 0.0749 0.4678
3.0556 1100 0.0393 0.4564
3.1944 1150 0.0454 0.4301
3.3333 1200 0.062 0.4882
3.4722 1250 0.0645 0.4434
3.6111 1300 0.0115 0.4296
3.75 1350 0.0172 0.4398
3.8889 1400 0.0429 0.4396
4.0278 1450 0.0115 0.4482
4.1667 1500 0.0141 0.4597
4.3056 1550 0.0032 0.4776
4.4444 1600 0.0288 0.4693
4.5833 1650 0.006 0.4990
4.7222 1700 0.0222 0.4693
4.8611 1750 0.0016 0.4755
5.0 1800 0.0016 0.4367
5.1389 1850 0.0084 0.3789
5.2778 1900 0.0013 0.3689
5.4167 1950 0.0554 0.3591
5.5556 2000 0.0022 0.3691
5.6944 2050 0.0019 0.3776
5.8333 2100 0.0008 0.3802
5.9722 2150 0.0006 0.3799
6.1111 2200 0.0007 0.3688
6.25 2250 0.0003 0.3635
6.3889 2300 0.0125 0.3526
6.5278 2350 0.0034 0.3338
6.6667 2400 0.0003 0.3482
6.8056 2450 0.0149 0.3730
6.9444 2500 0.0004 0.3932
7.0833 2550 0.0003 0.3977
7.2222 2600 0.0007 0.3915
7.3611 2650 0.0112 0.3923
7.5 2700 0.0006 0.3938
7.6389 2750 0.0002 0.3986
7.7778 2800 0.0005 0.3946
7.9167 2850 0.0003 0.3944
8.0556 2900 0.0002 0.3996
8.1944 2950 0.0001 0.4032
8.3333 3000 0.0001 0.4018
8.4722 3050 0.0119 0.3811
8.6111 3100 0.0001 0.3826
8.75 3150 0.0001 0.3844
8.8889 3200 0.0002 0.3893
9.0278 3250 0.0001 0.3942
9.1667 3300 0.0001 0.3963
9.3056 3350 0.0001 0.3965
9.4444 3400 0.0144 0.3766
9.5833 3450 0.0002 0.3792
9.7222 3500 0.0001 0.3830
9.8611 3550 0.0001 0.3870
10.0 3600 0.0002 0.3909
10.1389 3650 0.0001 0.3939
10.2778 3700 0.0001 0.3943
10.4167 3750 0.0103 0.3896
10.5556 3800 0.0001 0.3906
10.6944 3850 0.0001 0.3929
10.8333 3900 0.0001 0.3957
10.9722 3950 0.0001 0.3969
11.1111 4000 0.0001 0.4016
11.25 4050 0.0001 0.4012
11.3889 4100 0.0049 0.4058
11.5278 4150 0.0002 0.4117
11.6667 4200 0.0001 0.4121
11.8056 4250 0.0001 0.4131
11.9444 4300 0.0001 0.4140
12.0833 4350 0.0001 0.4145
12.2222 4400 0.0001 0.4145
12.3611 4450 0.0085 0.4135
12.5 4500 0.0001 0.4112
12.6389 4550 0.0001 0.4119
12.7778 4600 0.0001 0.4127
12.9167 4650 0.0001 0.4140
13.0556 4700 0.0001 0.4174
13.1944 4750 0.0001 0.4182
13.3333 4800 0.0001 0.4187
13.4722 4850 0.0051 0.4184
13.6111 4900 0.0001 0.4183
13.75 4950 0.0001 0.4190
13.8889 5000 0.0001 0.4195
14.0278 5050 0.0001 0.4199
14.1667 5100 0.0002 0.4177
14.3056 5150 0.0001 0.4177
14.4444 5200 0.0066 0.4153
14.5833 5250 0.0001 0.4155
14.7222 5300 0.0001 0.4155
14.8611 5350 0.0001 0.4155
15.0 5400 0.0001 0.4156

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cpu
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1,827
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for neel2306/RE-cp-costgen

Base model

thenlper/gte-base
Finetuned
(11)
this model