Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Fine tuning poc1

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("cferreiragonz/bge-base-fastdds-questions")
# Run inference
sentences = [
    '  For further information about Fast DDS build system dependencies\n  regarding QNX 7.1, please refer to the Fast DDS Build system\n  dependencies section.',
    'What is required for installing eProsima Fast DDS on a QNX 7.1 target from sources?',
    'What is the primary purpose of the SubscriberListener class in terms of handling state changes on a Subscriber?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.3054
cosine_accuracy@3 0.4988
cosine_accuracy@5 0.5548
cosine_accuracy@10 0.627
cosine_precision@1 0.3054
cosine_precision@3 0.1663
cosine_precision@5 0.111
cosine_precision@10 0.0627
cosine_recall@1 0.3054
cosine_recall@3 0.4988
cosine_recall@5 0.5548
cosine_recall@10 0.627
cosine_ndcg@10 0.4668
cosine_mrr@10 0.4156
cosine_map@100 0.4222

Information Retrieval

Metric Value
cosine_accuracy@1 0.2984
cosine_accuracy@3 0.4848
cosine_accuracy@5 0.5431
cosine_accuracy@10 0.627
cosine_precision@1 0.2984
cosine_precision@3 0.1616
cosine_precision@5 0.1086
cosine_precision@10 0.0627
cosine_recall@1 0.2984
cosine_recall@3 0.4848
cosine_recall@5 0.5431
cosine_recall@10 0.627
cosine_ndcg@10 0.461
cosine_mrr@10 0.4083
cosine_map@100 0.4149

Information Retrieval

Metric Value
cosine_accuracy@1 0.3007
cosine_accuracy@3 0.4802
cosine_accuracy@5 0.5221
cosine_accuracy@10 0.6061
cosine_precision@1 0.3007
cosine_precision@3 0.1601
cosine_precision@5 0.1044
cosine_precision@10 0.0606
cosine_recall@1 0.3007
cosine_recall@3 0.4802
cosine_recall@5 0.5221
cosine_recall@10 0.6061
cosine_ndcg@10 0.4522
cosine_mrr@10 0.4033
cosine_map@100 0.4114

Information Retrieval

Metric Value
cosine_accuracy@1 0.289
cosine_accuracy@3 0.4615
cosine_accuracy@5 0.5175
cosine_accuracy@10 0.5944
cosine_precision@1 0.289
cosine_precision@3 0.1538
cosine_precision@5 0.1035
cosine_precision@10 0.0594
cosine_recall@1 0.289
cosine_recall@3 0.4615
cosine_recall@5 0.5175
cosine_recall@10 0.5944
cosine_ndcg@10 0.4393
cosine_mrr@10 0.39
cosine_map@100 0.3981

Information Retrieval

Metric Value
cosine_accuracy@1 0.2564
cosine_accuracy@3 0.4149
cosine_accuracy@5 0.4779
cosine_accuracy@10 0.5548
cosine_precision@1 0.2564
cosine_precision@3 0.1383
cosine_precision@5 0.0956
cosine_precision@10 0.0555
cosine_recall@1 0.2564
cosine_recall@3 0.4149
cosine_recall@5 0.4779
cosine_recall@10 0.5548
cosine_ndcg@10 0.4019
cosine_mrr@10 0.3535
cosine_map@100 0.3622

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • fp16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.6639 10 5.0047 - - - - -
0.9959 15 - 0.3624 0.3806 0.3842 0.3318 0.3864
1.3278 20 3.3543 - - - - -
1.9917 30 2.5931 0.3886 0.4016 0.4103 0.3603 0.4153
2.6556 40 2.1763 - - - - -
2.9876 45 - 0.3966 0.4126 0.4156 0.3623 0.4205
3.3195 50 2.0242 - - - - -
3.9834 60 1.9003 0.3981 0.4114 0.4149 0.3622 0.4222
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
28
Safetensors
Model size
109M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Evaluation results