Edit model card

SentenceTransformer based on dbourget/philai-embeddings-2.0

This is a sentence-transformers model finetuned from dbourget/philai-embeddings-2.0. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dbourget/philai-embeddings-2.0
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-30e")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8215
dot_accuracy 0.2449
manhattan_accuracy 0.835
euclidean_accuracy 0.8342
max_accuracy 0.835

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • learning_rate: 1e-06
  • weight_decay: 0.01
  • num_train_epochs: 20
  • lr_scheduler_type: constant
  • bf16: True
  • dataloader_drop_last: True
  • resume_from_checkpoint: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-06
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: True
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss beatai-dev_max_accuracy
0 0 - - 0.8308
0.1471 10 1.056 - -
0.2941 20 1.0992 - -
0.4412 30 1.1678 - -
0.5882 40 1.1586 - -
0.7353 50 1.1777 2.0793 0.8291
0.8824 60 1.1344 - -
1.0294 70 1.0578 - -
1.1765 80 1.0981 - -
1.3235 90 1.1216 - -
1.4706 100 1.0436 2.0826 0.8283
1.6176 110 1.0422 - -
1.7647 120 1.0857 - -
1.9118 130 1.0502 - -
2.0588 140 1.0363 - -
2.2059 150 1.081 2.0763 0.8316
2.3529 160 1.1764 - -
2.5 170 1.0393 - -
2.6471 180 0.9586 - -
2.7941 190 1.0537 - -
2.9412 200 1.0313 2.0645 0.8325
3.0882 210 1.0401 - -
3.2353 220 1.0389 - -
3.3824 230 1.0225 - -
3.5294 240 1.0131 - -
3.6765 250 0.9565 2.0705 0.8308
3.8235 260 1.0059 - -
3.9706 270 0.9629 - -
4.1176 280 0.9546 - -
4.2647 290 0.989 - -
4.4118 300 1.0573 2.0514 0.8375
4.5588 310 0.894 - -
4.7059 320 1.0082 - -
4.8529 330 0.969 - -
5.0 340 0.9187 - -
5.1471 350 0.9034 2.0663 0.8350
5.2941 360 0.9043 - -
5.4412 370 0.9517 - -
5.5882 380 1.0272 - -
5.7353 390 0.95 - -
5.8824 400 0.8288 2.0400 0.8367
6.0294 410 0.9809 - -
6.1765 420 0.8776 - -
6.3235 430 0.9744 - -
6.4706 440 0.9982 - -
6.6176 450 0.9076 2.0429 0.8350
6.7647 460 0.8792 - -
6.9118 470 0.787 - -
7.0588 480 0.9506 - -
7.2059 490 0.927 - -
7.3529 500 0.9464 2.0487 0.8316
7.5 510 0.886 - -
7.6471 520 0.9142 - -
7.7941 530 0.8741 - -
7.9412 540 0.8703 - -
8.0882 550 0.8947 2.0411 0.8333
8.2353 560 0.8742 - -
8.3824 570 0.8083 - -
8.5294 580 0.9134 - -
8.6765 590 0.8197 - -
8.8235 600 0.8253 2.0272 0.8367
8.9706 610 0.8665 - -
9.1176 620 0.8853 - -
9.2647 630 0.7566 - -
9.4118 640 0.9101 - -
9.5588 650 0.801 2.0243 0.8350
9.7059 660 0.8551 - -
9.8529 670 0.8748 - -
10.0 680 0.9798 - -
10.1471 690 1.0544 - -
10.2941 700 1.2077 2.0128 0.8367
10.4412 710 1.0386 - -
10.5882 720 1.0508 - -
10.7353 730 1.0063 - -
10.8824 740 1.0758 - -
11.0294 750 1.1552 2.0031 0.8367
11.1765 760 1.0259 - -
11.3235 770 1.0724 - -
11.4706 780 1.0524 - -
11.6176 790 0.9957 - -
11.7647 800 1.0697 2.0022 0.8367
11.9118 810 1.0544 - -
12.0588 820 1.0762 - -
12.2059 830 1.0858 - -
12.3529 840 1.0418 - -
12.5 850 1.0041 1.9936 0.8392
12.6471 860 0.998 - -
12.7941 870 1.0737 - -
12.9412 880 1.0637 - -
13.0882 890 0.9689 - -
13.2353 900 1.001 1.9818 0.8392
13.3824 910 1.0418 - -
13.5294 920 1.0097 - -
13.6765 930 1.0244 - -
13.8235 940 1.0383 - -
13.9706 950 1.034 1.9798 0.8367
14.1176 960 0.9609 - -
14.2647 970 1.049 - -
14.4118 980 1.0012 - -
14.5588 990 0.9008 - -
14.7059 1000 1.0131 1.9741 0.8384
14.8529 1010 0.9714 - -
15.0 1020 0.9987 - -
15.1471 1030 1.1139 - -
15.2941 1040 1.005 - -
15.4412 1050 0.9074 1.9761 0.8359
15.5882 1060 0.9298 - -
15.7353 1070 0.9335 - -
15.8824 1080 0.9445 - -
16.0294 1090 1.0087 - -
16.1765 1100 0.9187 1.9679 0.8384
16.3235 1110 0.8502 - -
16.4706 1120 0.9924 - -
16.6176 1130 0.9982 - -
16.7647 1140 0.9643 - -
16.9118 1150 0.9491 1.9727 0.8333
17.0588 1160 0.9801 - -
17.2059 1170 0.9374 - -
17.3529 1180 0.8309 - -
17.5 1190 0.9524 - -
17.6471 1200 0.886 1.9797 0.8350
17.7941 1210 0.9026 - -
17.9412 1220 0.8859 - -
18.0882 1230 0.8745 - -
18.2353 1240 0.9474 - -
18.3824 1250 0.878 1.9737 0.8342
18.5294 1260 0.8372 - -
18.6765 1270 0.833 - -
18.8235 1280 0.9648 - -
18.9706 1290 0.918 - -
19.1176 1300 0.9588 1.9669 0.8359
19.2647 1310 1.0334 - -
19.4118 1320 0.8347 - -
19.5588 1330 0.828 - -
19.7059 1340 0.9117 - -
19.8529 1350 0.9123 1.9666 0.8350
20.0 1360 0.8538 - -

Framework Versions

  • Python: 3.8.18
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.0
  • PyTorch: 1.13.1+cu117
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
3
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-30e

Finetuned
(1)
this model

Evaluation results