Edit model card

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Nutanix/bge-base-mbpp")
# Run inference
sentences = [
    'Write a function to find sum and average of first n natural numbers.',
    'def sum_average(number):\r\n total = 0\r\n for value in range(1, number + 1):\r\n    total = total + value\r\n average = total / number\r\n return (total,average)',
    'def long_words(n, str):\r\n    word_len = []\r\n    txt = str.split(" ")\r\n    for x in txt:\r\n        if len(x) > n:\r\n            word_len.append(x)\r\n    return word_len\t',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9971
dot_accuracy 0.0028
manhattan_accuracy 0.9961
euclidean_accuracy 0.9971
max_accuracy 0.9971

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss sts-dev_max_accuracy
0.0050 100 4.3364 -
0.0101 200 4.122 -
0.0151 300 4.0825 -
0.0202 400 4.0381 -
0.0252 500 4.015 -
0.0302 600 3.9996 -
0.0353 700 3.9567 -
0.0403 800 3.9593 -
0.0453 900 3.9456 -
0.0504 1000 3.938 -
0.0554 1100 3.933 -
0.0605 1200 3.905 -
0.0655 1300 3.906 -
0.0705 1400 3.9073 -
0.0756 1500 3.9193 -
0.0806 1600 3.9016 -
0.0857 1700 3.8899 -
0.0907 1800 3.9 -
0.0957 1900 3.8983 -
0.1008 2000 3.876 -
0.1058 2100 3.9001 -
0.1109 2200 3.8818 -
0.1159 2300 3.8788 -
0.1209 2400 3.8815 -
0.1260 2500 3.8664 -
0.1310 2600 3.854 -
0.1360 2700 3.8674 -
0.1411 2800 3.8525 -
0.1461 2900 3.8733 -
0.1512 3000 3.8538 -
0.1562 3100 3.8348 -
0.1612 3200 3.8378 -
0.1663 3300 3.8504 -
0.1713 3400 3.8409 -
0.1764 3500 3.8436 -
0.1814 3600 3.8422 -
0.1864 3700 3.8629 -
0.1915 3800 3.8589 -
0.1965 3900 3.8572 -
0.2016 4000 3.8309 -
0.2066 4100 3.8465 -
0.2116 4200 3.8311 -
0.2167 4300 3.8124 -
0.2217 4400 3.8412 -
0.2267 4500 3.8228 -
0.2318 4600 3.8012 -
0.2368 4700 3.8185 -
0.2419 4800 3.8242 -
0.2469 4900 3.7917 -
0.2519 5000 3.8022 -
0.2570 5100 3.7991 -
0.2620 5200 3.7943 -
0.2671 5300 3.7874 -
0.2721 5400 3.7987 -
0.2771 5500 3.7982 -
0.2822 5600 3.7789 -
0.2872 5700 3.7837 -
0.2923 5800 3.7762 -
0.2973 5900 3.7854 -
0.3023 6000 3.7719 -
0.3074 6100 3.7925 -
0.3124 6200 3.7795 -
0.3174 6300 3.7725 -
0.3225 6400 3.7897 -
0.3275 6500 3.773 -
0.3326 6600 3.7803 -
0.3376 6700 3.7476 -
0.3426 6800 3.7585 -
0.3477 6900 3.7426 -
0.3527 7000 3.7529 -
0.3578 7100 3.7745 -
0.3628 7200 3.7771 -
0.3678 7300 3.7598 -
0.3729 7400 3.7428 -
0.3779 7500 3.7409 -
0.3829 7600 3.7569 -
0.3880 7700 3.7517 -
0.3930 7800 3.7484 -
0.3981 7900 3.7415 -
0.4031 8000 3.7228 -
0.4081 8100 3.7569 -
0.4132 8200 3.7421 -
0.4182 8300 3.7233 -
0.4233 8400 3.72 -
0.4283 8500 3.7431 -
0.4333 8600 3.7258 -
0.4384 8700 3.73 -
0.4434 8800 3.7286 -
0.4485 8900 3.7487 -
0.4535 9000 3.7359 -
0.4585 9100 3.7387 -
0.4636 9200 3.7135 -
0.4686 9300 3.7219 -
0.4736 9400 3.7189 -
0.4787 9500 3.7234 -
0.4837 9600 3.7333 -
0.4888 9700 3.7027 -
0.4938 9800 3.7358 -
0.4988 9900 3.6959 -
0.5039 10000 3.7051 -
0.5089 10100 3.7205 -
0.5140 10200 3.711 -
0.5190 10300 3.6898 -
0.5240 10400 3.7103 -
0.5291 10500 3.695 -
0.5341 10600 3.7108 -
0.5392 10700 3.7226 -
0.5442 10800 3.7004 -
0.5492 10900 3.736 -
0.5543 11000 3.7135 -
0.5593 11100 3.7148 -
0.5643 11200 3.7285 -
0.5694 11300 3.694 -
0.5744 11400 3.6913 -
0.5795 11500 3.69 -
0.5845 11600 3.7249 -
0.5895 11700 3.6907 -
0.5946 11800 3.7135 -
0.5996 11900 3.7172 -
0.6047 12000 3.7087 -
0.6097 12100 3.7045 -
0.6147 12200 3.7043 -
0.6198 12300 3.693 -
0.6248 12400 3.6982 -
0.6298 12500 3.6922 -
0.6349 12600 3.6857 -
0.6399 12700 3.6834 -
0.6450 12800 3.7052 -
0.6500 12900 3.6935 -
0.6550 13000 3.6736 -
0.6601 13100 3.7026 -
0.6651 13200 3.6846 -
0.6702 13300 3.704 -
0.6752 13400 3.6818 -
0.6802 13500 3.7075 -
0.6853 13600 3.6688 -
0.6903 13700 3.6933 -
0.6954 13800 3.6971 -
0.7004 13900 3.6785 -
0.7054 14000 3.7088 -
0.7105 14100 3.7127 -
0.7155 14200 3.6996 -
0.7205 14300 3.6901 -
0.7256 14400 3.6914 -
0.7306 14500 3.6659 -
0.7357 14600 3.6859 -
0.7407 14700 3.68 -
0.7457 14800 3.6874 -
0.7508 14900 3.6854 -
0.7558 15000 3.671 -
0.7609 15100 3.6909 -
0.7659 15200 3.7014 -
0.7709 15300 3.6828 -
0.7760 15400 3.6773 -
0.7810 15500 3.6863 -
0.7861 15600 3.6892 -
0.7911 15700 3.6864 -
0.7961 15800 3.6586 -
0.8012 15900 3.6639 -
0.8062 16000 3.6843 -
0.8112 16100 3.6865 -
0.8163 16200 3.678 -
0.8213 16300 3.6825 -
0.8264 16400 3.7068 -
0.8314 16500 3.6886 -
0.8364 16600 3.6905 -
0.8415 16700 3.6905 -
0.8465 16800 3.6677 -
0.8516 16900 3.684 -
0.8566 17000 3.6872 -
0.8616 17100 3.6849 -
0.8667 17200 3.662 -
0.8717 17300 3.6887 -
0.8768 17400 3.6999 -
0.8818 17500 3.6916 -
0.8868 17600 3.6853 -
0.8919 17700 3.6971 -
0.8969 17800 3.6846 -
0.9019 17900 3.6701 -
0.9070 18000 3.6911 -
0.9120 18100 3.7021 -
0.9171 18200 3.6851 -
0.9221 18300 3.6924 -
0.9271 18400 3.6644 -
0.9322 18500 3.6674 -
0.9372 18600 3.6962 -
0.9423 18700 3.6759 -
0.9473 18800 3.6839 -
0.9523 18900 3.6822 -
0.9574 19000 3.6947 -
0.9624 19100 3.6589 -
0.9674 19200 3.6817 -
0.9725 19300 3.6754 -
0.9775 19400 3.6947 -
0.9826 19500 3.6785 -
0.9876 19600 3.6776 -
0.9926 19700 3.6791 -
0.9977 19800 3.6795 -
1.0 19846 - 0.9971

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.40.0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Nutanix/bge-base-mbpp

Finetuned
(259)
this model

Evaluation results