tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
3399e41 verified
|
raw
history blame
29.5 kB
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:OnlineContrastiveLoss
base_model: sentence-transformers/stsb-distilbert-base
metrics:
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
  - average_precision
  - f1
  - precision
  - recall
  - threshold
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
widget:
  - source_sentence: Why did he go MIA?
    sentences:
      - Why did Yahoo kill Konfabulator?
      - Why do people get angry with me?
      - What are the best waterproof guns?
  - source_sentence: Who is a soulmate?
    sentences:
      - Is she the “One”?
      - Who is Pakistan's biggest enemy?
      - Will smoking weed help with my anxiety?
  - source_sentence: Is this poem good?
    sentences:
      - Is my poem any good?
      - How can I become a good speaker?
      - What is feminism?
  - source_sentence: Who invented Yoga?
    sentences:
      - How was yoga invented?
      - Who owns this number 3152150252?
      - What is Dynamics CRM Services?
  - source_sentence: Is stretching bad?
    sentences:
      - Is stretching good for you?
      - If i=0; what will i=i++ do to i?
      - What is the Output of this C program ?
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 15.707175691967695
  energy_consumed: 0.040409299905757354
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.202
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on sentence-transformers/stsb-distilbert-base
    results:
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: quora duplicates
          type: quora-duplicates
        metrics:
          - type: cosine_accuracy
            value: 0.86
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8104104995727539
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.8250591016548463
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7247534394264221
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.7347368421052631
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9407008086253369
            name: Cosine Recall
          - type: cosine_ap
            value: 0.887247904332921
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.828
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 157.35491943359375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.7898550724637681
            name: Dot F1
          - type: dot_f1_threshold
            value: 145.7113037109375
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.7155361050328227
            name: Dot Precision
          - type: dot_recall
            value: 0.8814016172506739
            name: Dot Recall
          - type: dot_ap
            value: 0.8369433397850002
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.868
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 208.00347900390625
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.8307692307692308
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 208.00347900390625
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.7921760391198044
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8733153638814016
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.8868217413983182
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.867
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 9.269388198852539
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.8301404853128991
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 9.525729179382324
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.7888349514563107
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.876010781671159
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.8884154240019244
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.868
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 208.00347900390625
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.8307692307692308
            name: Max F1
          - type: max_f1_threshold
            value: 208.00347900390625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.7921760391198044
            name: Max Precision
          - type: max_recall
            value: 0.9407008086253369
            name: Max Recall
          - type: max_ap
            value: 0.8884154240019244
            name: Max Ap
      - task:
          type: paraphrase-mining
          name: Paraphrase Mining
        dataset:
          name: quora duplicates dev
          type: quora-duplicates-dev
        metrics:
          - type: average_precision
            value: 0.534436244125929
            name: Average Precision
          - type: f1
            value: 0.5447997274541295
            name: F1
          - type: precision
            value: 0.5311002514589362
            name: Precision
          - type: recall
            value: 0.5592246590398161
            name: Recall
          - type: threshold
            value: 0.8626040816307068
            name: Threshold
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.928
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9712
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9782
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9874
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.928
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4151333333333334
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.26656
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.14166
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7993523853760618
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9341884771405065
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9560896250710075
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9766088525134997
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9516150309696244
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9509392857142857
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.9390263696194139
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.8926
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.9518
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.9658
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.9768
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.8926
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.40273333333333333
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.26076
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.13882
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.7679620996617761
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.9105756956997251
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.9402185219519044
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.9623418143294613
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.9263520741106431
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.9243020634920638
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.9094019438194247
            name: Dot Map@100

SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the sentence-transformers/quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'Is stretching bad?',
    'Is stretching good for you?',
    'If i=0; what will i=i++ do to i?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.86
cosine_accuracy_threshold 0.8104
cosine_f1 0.8251
cosine_f1_threshold 0.7248
cosine_precision 0.7347
cosine_recall 0.9407
cosine_ap 0.8872
dot_accuracy 0.828
dot_accuracy_threshold 157.3549
dot_f1 0.7899
dot_f1_threshold 145.7113
dot_precision 0.7155
dot_recall 0.8814
dot_ap 0.8369
manhattan_accuracy 0.868
manhattan_accuracy_threshold 208.0035
manhattan_f1 0.8308
manhattan_f1_threshold 208.0035
manhattan_precision 0.7922
manhattan_recall 0.8733
manhattan_ap 0.8868
euclidean_accuracy 0.867
euclidean_accuracy_threshold 9.2694
euclidean_f1 0.8301
euclidean_f1_threshold 9.5257
euclidean_precision 0.7888
euclidean_recall 0.876
euclidean_ap 0.8884
max_accuracy 0.868
max_accuracy_threshold 208.0035
max_f1 0.8308
max_f1_threshold 208.0035
max_precision 0.7922
max_recall 0.9407
max_ap 0.8884

Paraphrase Mining

Metric Value
average_precision 0.5344
f1 0.5448
precision 0.5311
recall 0.5592
threshold 0.8626

Information Retrieval

Metric Value
cosine_accuracy@1 0.928
cosine_accuracy@3 0.9712
cosine_accuracy@5 0.9782
cosine_accuracy@10 0.9874
cosine_precision@1 0.928
cosine_precision@3 0.4151
cosine_precision@5 0.2666
cosine_precision@10 0.1417
cosine_recall@1 0.7994
cosine_recall@3 0.9342
cosine_recall@5 0.9561
cosine_recall@10 0.9766
cosine_ndcg@10 0.9516
cosine_mrr@10 0.9509
cosine_map@100 0.939
dot_accuracy@1 0.8926
dot_accuracy@3 0.9518
dot_accuracy@5 0.9658
dot_accuracy@10 0.9768
dot_precision@1 0.8926
dot_precision@3 0.4027
dot_precision@5 0.2608
dot_precision@10 0.1388
dot_recall@1 0.768
dot_recall@3 0.9106
dot_recall@5 0.9402
dot_recall@10 0.9623
dot_ndcg@10 0.9264
dot_mrr@10 0.9243
dot_map@100 0.9094

Training Details

Training Dataset

sentence-transformers/quora-duplicates

  • Dataset: sentence-transformers/quora-duplicates at 451a485
  • Size: 100,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.5 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.46 tokens
    • max: 78 tokens
    • 0: ~64.10%
    • 1: ~35.90%
  • Samples:
    sentence1 sentence2 label
    What are the best ecommerce blogs to do guest posts on about SEO to gain new clients? Interested in being a guest blogger for an ecommerce marketing blog? 0
    How do I learn Informatica online training? What is Informatica online training? 0
    What effects does marijuana use have on the flu? What effects does Marijuana use have on the common cold? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

sentence-transformers/quora-duplicates

  • Dataset: sentence-transformers/quora-duplicates at 451a485
  • Size: 1,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.82 tokens
    • max: 46 tokens
    • min: 6 tokens
    • mean: 15.91 tokens
    • max: 72 tokens
    • 0: ~62.90%
    • 1: ~37.10%
  • Samples:
    sentence1 sentence2 label
    How should I prepare for JEE Mains 2017? How do I prepare for the JEE 2016? 0
    What is the gate exam? What is the GATE exam in engineering? 0
    Where do IRS officers get posted? Does IRS Officers get posted abroad? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss cosine_map@100 quora-duplicates-dev_average_precision quora-duplicates_max_ap
0 0 - - 0.9235 0.4200 0.7276
0.0640 100 2.5123 - - - -
0.1280 200 2.0534 - - - -
0.1599 250 - 1.7914 0.9127 0.4082 0.8301
0.1919 300 1.9505 - - - -
0.2559 400 1.9836 - - - -
0.3199 500 1.8462 1.5923 0.9190 0.4445 0.8688
0.3839 600 1.7734 - - - -
0.4479 700 1.7918 - - - -
0.4798 750 - 1.5461 0.9291 0.4943 0.8707
0.5118 800 1.6157 - - - -
0.5758 900 1.7244 - - - -
0.6398 1000 1.7322 1.5294 0.9309 0.5048 0.8808
0.7038 1100 1.6825 - - - -
0.7678 1200 1.6823 - - - -
0.7997 1250 - 1.4812 0.9351 0.5126 0.8865
0.8317 1300 1.5707 - - - -
0.8957 1400 1.6145 - - - -
0.9597 1500 1.5795 1.4705 0.9390 0.5344 0.8884

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.040 kWh
  • Carbon Emitted: 0.016 kg of CO2
  • Hours Used: 0.202 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}