Edit model card

SentenceTransformer based on dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-cosine-50e

This is a sentence-transformers model finetuned from dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-cosine-50e. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-cosine-80e")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.814
dot_accuracy 0.2273
manhattan_accuracy 0.8199
euclidean_accuracy 0.8157
max_accuracy 0.8199

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • learning_rate: 5e-07
  • weight_decay: 0.01
  • num_train_epochs: 30
  • lr_scheduler_type: constant
  • bf16: True
  • dataloader_drop_last: True
  • resume_from_checkpoint: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-07
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 30
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: True
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss beatai-dev_cosine_accuracy
0 0 - - 0.7904
0.1471 10 0.0721 - -
0.2941 20 0.0708 - -
0.4412 30 0.0736 - -
0.5882 40 0.0704 - -
0.7353 50 0.0732 0.0971 0.7929
0.8824 60 0.0716 - -
1.0294 70 0.0665 - -
1.1765 80 0.0698 - -
1.3235 90 0.0699 - -
1.4706 100 0.0691 0.0968 0.7912
1.6176 110 0.0687 - -
1.7647 120 0.0701 - -
1.9118 130 0.0689 - -
2.0588 140 0.0696 - -
2.2059 150 0.071 0.0966 0.7929
2.3529 160 0.078 - -
2.5 170 0.0675 - -
2.6471 180 0.065 - -
2.7941 190 0.0684 - -
2.9412 200 0.0689 0.0963 0.7938
3.0882 210 0.0736 - -
3.2353 220 0.0684 - -
3.3824 230 0.0669 - -
3.5294 240 0.0688 - -
3.6765 250 0.0678 0.0959 0.7963
3.8235 260 0.0682 - -
3.9706 270 0.0678 - -
4.1176 280 0.0686 - -
4.2647 290 0.0664 - -
4.4118 300 0.0703 0.0957 0.7980
4.5588 310 0.065 - -
4.7059 320 0.0719 - -
4.8529 330 0.0685 - -
5.0 340 0.0639 - -
5.1471 350 0.0667 0.0957 0.7971
5.2941 360 0.0661 - -
5.4412 370 0.0678 - -
5.5882 380 0.0725 - -
5.7353 390 0.0655 - -
5.8824 400 0.0649 0.0953 0.7980
6.0294 410 0.0661 - -
6.1765 420 0.0662 - -
6.3235 430 0.0671 - -
6.4706 440 0.0698 - -
6.6176 450 0.0636 0.0951 0.7980
6.7647 460 0.0644 - -
6.9118 470 0.0633 - -
7.0588 480 0.0679 - -
7.2059 490 0.067 - -
7.3529 500 0.0713 0.0948 0.7963
7.5 510 0.0677 - -
7.6471 520 0.0666 - -
7.7941 530 0.065 - -
7.9412 540 0.0665 - -
8.0882 550 0.0656 0.0946 0.7963
8.2353 560 0.0649 - -
8.3824 570 0.0649 - -
8.5294 580 0.0653 - -
8.6765 590 0.0648 - -
8.8235 600 0.0622 0.0944 0.7946
8.9706 610 0.0689 - -
9.1176 620 0.0711 - -
9.2647 630 0.0611 - -
9.4118 640 0.0697 - -
9.5588 650 0.0645 0.0942 0.7963
9.7059 660 0.0639 - -
9.8529 670 0.0643 - -
10.0 680 0.0644 - -
10.1471 690 0.0599 - -
10.2941 700 0.0723 0.0940 0.7955
10.4412 710 0.0652 - -
10.5882 720 0.0646 - -
10.7353 730 0.0602 - -
10.8824 740 0.0644 - -
11.0294 750 0.066 0.0938 0.7971
11.1765 760 0.0624 - -
11.3235 770 0.0652 - -
11.4706 780 0.0649 - -
11.6176 790 0.0624 - -
11.7647 800 0.0626 0.0937 0.7988
11.9118 810 0.0635 - -
12.0588 820 0.0643 - -
12.2059 830 0.0663 - -
12.3529 840 0.0641 - -
12.5 850 0.0614 0.0933 0.8005
12.6471 860 0.0613 - -
12.7941 870 0.0648 - -
12.9412 880 0.065 - -
13.0882 890 0.0589 - -
13.2353 900 0.0632 0.0931 0.7997
13.3824 910 0.0649 - -
13.5294 920 0.0612 - -
13.6765 930 0.0634 - -
13.8235 940 0.0637 - -
13.9706 950 0.0626 0.0930 0.7997
14.1176 960 0.0593 - -
14.2647 970 0.0662 - -
14.4118 980 0.0644 - -
14.5588 990 0.0582 - -
14.7059 1000 0.0626 0.0927 0.8013
14.8529 1010 0.0605 - -
15.0 1020 0.0615 - -
15.1471 1030 0.0676 - -
15.2941 1040 0.0633 - -
15.4412 1050 0.06 0.0927 0.8047
15.5882 1060 0.0572 - -
15.7353 1070 0.0579 - -
15.8824 1080 0.0594 - -
16.0294 1090 0.063 - -
16.1765 1100 0.0581 0.0927 0.8030
16.3235 1110 0.0564 - -
16.4706 1120 0.0632 - -
16.6176 1130 0.065 - -
16.7647 1140 0.0602 - -
16.9118 1150 0.0581 0.0926 0.8039
17.0588 1160 0.0623 - -
17.2059 1170 0.06 - -
17.3529 1180 0.0562 - -
17.5 1190 0.0627 - -
17.6471 1200 0.056 0.0924 0.8013
17.7941 1210 0.0586 - -
17.9412 1220 0.0576 - -
18.0882 1230 0.056 - -
18.2353 1240 0.0611 - -
18.3824 1250 0.0551 0.0922 0.8047
18.5294 1260 0.058 - -
18.6765 1270 0.0571 - -
18.8235 1280 0.0616 - -
18.9706 1290 0.0599 - -
19.1176 1300 0.0604 0.0920 0.8081
19.2647 1310 0.0633 - -
19.4118 1320 0.0573 - -
19.5588 1330 0.0549 - -
19.7059 1340 0.0591 - -
19.8529 1350 0.0585 0.0918 0.8089
20.0 1360 0.057 - -
20.1471 1370 0.057 - -
20.2941 1380 0.0625 - -
20.4412 1390 0.0589 - -
20.5882 1400 0.0577 0.0918 0.8098
20.7353 1410 0.0583 - -
20.8824 1420 0.0567 - -
21.0294 1430 0.0619 - -
21.1765 1440 0.0572 - -
21.3235 1450 0.0594 0.0917 0.8123
21.4706 1460 0.0567 - -
21.6176 1470 0.0611 - -
21.7647 1480 0.0533 - -
21.9118 1490 0.0595 - -
22.0588 1500 0.0521 0.0913 0.8114
22.2059 1510 0.0586 - -
22.3529 1520 0.0603 - -
22.5 1530 0.0601 - -
22.6471 1540 0.0567 - -
22.7941 1550 0.0551 0.0911 0.8114
22.9412 1560 0.0542 - -
23.0882 1570 0.057 - -
23.2353 1580 0.0541 - -
23.3824 1590 0.0586 - -
23.5294 1600 0.0573 0.0912 0.8106
23.6765 1610 0.0543 - -
23.8235 1620 0.0578 - -
23.9706 1630 0.0563 - -
24.1176 1640 0.0549 - -
24.2647 1650 0.0549 0.0909 0.8140
24.4118 1660 0.056 - -
24.5588 1670 0.0599 - -
24.7059 1680 0.0543 - -
24.8529 1690 0.0547 - -
25.0 1700 0.0575 0.0906 0.8114
25.1471 1710 0.0544 - -
25.2941 1720 0.0574 - -
25.4412 1730 0.0565 - -
25.5882 1740 0.0587 - -
25.7353 1750 0.0559 0.0905 0.8157
25.8824 1760 0.0551 - -
26.0294 1770 0.0569 - -
26.1765 1780 0.0516 - -
26.3235 1790 0.0561 - -
26.4706 1800 0.0567 0.0906 0.8165
26.6176 1810 0.0599 - -
26.7647 1820 0.0577 - -
26.9118 1830 0.0532 - -
27.0588 1840 0.0554 - -
27.2059 1850 0.0579 0.0906 0.8123
27.3529 1860 0.0532 - -
27.5 1870 0.0493 - -
27.6471 1880 0.0552 - -
27.7941 1890 0.0532 - -
27.9412 1900 0.0569 0.0904 0.8089
28.0882 1910 0.0568 - -
28.2353 1920 0.052 - -
28.3824 1930 0.0555 - -
28.5294 1940 0.0563 - -
28.6765 1950 0.0555 0.0903 0.8140
28.8235 1960 0.0535 - -
28.9706 1970 0.0525 - -
29.1176 1980 0.0566 - -
29.2647 1990 0.0562 - -
29.4118 2000 0.0547 0.0902 0.8140
29.5588 2010 0.0495 - -
29.7059 2020 0.0532 - -
29.8529 2030 0.0553 - -
30.0 2040 0.0544 - -

Framework Versions

  • Python: 3.8.18
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 1.13.1+cu117
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
7
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-cosine-80e

Evaluation results