Edit model card

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

  1. bert-base-uncased was pretrained on a large corpus of open access philosophy text.
  2. This model was further trained using TSDAE on a subset of sentences from this corpus for 6 epochs.
  3. Resulting model was finetuned using cosine similarity objective on the "philsim" private dataset.
  4. Resulting model was finetuned using cosine similarity objective on the beatai-philosophy dataset.

Model internal name: pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-20e

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/philai-embeddings-2.0")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8081
dot_accuracy 0.2811
manhattan_accuracy 0.8316
euclidean_accuracy 0.8249
max_accuracy 0.8316

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • learning_rate: 2e-06
  • num_train_epochs: 10
  • lr_scheduler_type: constant
  • bf16: True
  • dataloader_drop_last: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss beatai-dev_max_accuracy
0 0 - - 0.8072
0.1471 10 1.8573 - -
0.2941 20 1.8196 - -
0.4412 30 1.8594 - -
0.5882 40 1.8581 - -
0.7353 50 1.8766 2.3603 0.8047
0.8824 60 1.8596 - -
1.0294 70 1.6816 - -
1.1765 80 1.7564 - -
1.3235 90 1.7191 - -
1.4706 100 1.6521 2.3296 0.8064
1.6176 110 1.7054 - -
1.7647 120 1.6895 - -
1.9118 130 1.6724 - -
2.0588 140 1.6369 - -
2.2059 150 1.705 2.2941 0.8123
2.3529 160 1.8329 - -
2.5 170 1.6071 - -
2.6471 180 1.5157 - -
2.7941 190 1.624 - -
2.9412 200 1.6185 2.2668 0.8140
3.0882 210 1.6259 - -
3.2353 220 1.5749 - -
3.3824 230 1.5426 - -
3.5294 240 1.5522 - -
3.6765 250 1.5141 2.2498 0.8157
3.8235 260 1.5215 - -
3.9706 270 1.4983 - -
4.1176 280 1.4819 - -
4.2647 290 1.4552 - -
4.4118 300 1.5597 2.2226 0.8199
4.5588 310 1.3983 - -
4.7059 320 1.5386 - -
4.8529 330 1.4541 - -
5.0 340 1.4097 - -
5.1471 350 1.3741 2.2129 0.8207
5.2941 360 1.3909 - -
5.4412 370 1.4116 - -
5.5882 380 1.52 - -
5.7353 390 1.3644 - -
5.8824 400 1.3016 2.1699 0.8266
6.0294 410 1.4435 - -
6.1765 420 1.3112 - -
6.3235 430 1.4056 - -
6.4706 440 1.4541 - -
6.6176 450 1.3312 2.1486 0.8224
6.7647 460 1.2879 - -
6.9118 470 1.227 - -
7.0588 480 1.3834 - -
7.2059 490 1.3242 - -
7.3529 500 1.3756 2.1507 0.8274
7.5 510 1.2872 - -
7.6471 520 1.3288 - -
7.7941 530 1.2689 - -
7.9412 540 1.3102 - -
8.0882 550 1.2929 2.1355 0.8207
8.2353 560 1.2511 - -
8.3824 570 1.1849 - -
8.5294 580 1.2774 - -
8.6765 590 1.1923 - -
8.8235 600 1.1927 2.1111 0.8283
8.9706 610 1.2556 - -
9.1176 620 1.2767 - -
9.2647 630 1.1082 - -
9.4118 640 1.3077 - -
9.5588 650 1.1435 2.0922 0.8316
9.7059 660 1.1888 - -
9.8529 670 1.2123 - -
10.0 680 1.2554 - -

Framework Versions

  • Python: 3.8.18
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 1.13.1+cu117
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dbourget/philai-embeddings-2.0

Finetunes
1 model

Evaluation results