dbourget's picture
Update README.md
d9add3b verified
metadata
library_name: sentence-transformers
metrics:
  - cosine_accuracy
  - dot_accuracy
  - manhattan_accuracy
  - euclidean_accuracy
  - max_accuracy
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:9504
  - loss:TripletLoss
widget:
  - source_sentence: cap product
    sentences:
      - >-
        method of adjoining a chain of degree p with a co-chain of degree q,
        where q is less than or equal to p, to form a composite chain of degree
        p-q
      - 'Ontology '
      - hat commodity
  - source_sentence: cognitivism
    sentences:
      - supporting cognitive science
      - >-
        study of changes in organisms caused by modification of gene expression
        rather than alteration of the genetic code
      - 'the idea that mind works like an algorithmic symbol manipulation '
  - source_sentence: doxastic voluntarism
    sentences:
      - Land surrounded by water
      - belief one is free
      - the ability to will beliefs
  - source_sentence: conceptual role
    sentences:
      - concept
      - inferential role
      - 'Theory of knowledge '
  - source_sentence: scientific revolutions
    sentences:
      - scientific realism
      - Universal moral principles govern legal systems
      - paradigm shifts
model-index:
  - name: SentenceTransformer
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: beatai dev
          type: beatai-dev
        metrics:
          - type: cosine_accuracy
            value: 0.8080808080808081
            name: Cosine Accuracy
          - type: dot_accuracy
            value: 0.28114478114478114
            name: Dot Accuracy
          - type: manhattan_accuracy
            value: 0.8316498316498316
            name: Manhattan Accuracy
          - type: euclidean_accuracy
            value: 0.8249158249158249
            name: Euclidean Accuracy
          - type: max_accuracy
            value: 0.8316498316498316
            name: Max Accuracy

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

  1. bert-base-uncased was pretrained on a large corpus of open access philosophy text.
  2. This model was further trained using TSDAE on a subset of sentences from this corpus for 6 epochs.
  3. Resulting model was finetuned using cosine similarity objective on the "philsim" private dataset.
  4. Resulting model was finetuned using cosine similarity objective on the beatai-philosophy dataset.

Model internal name: pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-20e

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/philai-embeddings-2.0")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8081
dot_accuracy 0.2811
manhattan_accuracy 0.8316
euclidean_accuracy 0.8249
max_accuracy 0.8316

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • learning_rate: 2e-06
  • num_train_epochs: 10
  • lr_scheduler_type: constant
  • bf16: True
  • dataloader_drop_last: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss beatai-dev_max_accuracy
0 0 - - 0.8072
0.1471 10 1.8573 - -
0.2941 20 1.8196 - -
0.4412 30 1.8594 - -
0.5882 40 1.8581 - -
0.7353 50 1.8766 2.3603 0.8047
0.8824 60 1.8596 - -
1.0294 70 1.6816 - -
1.1765 80 1.7564 - -
1.3235 90 1.7191 - -
1.4706 100 1.6521 2.3296 0.8064
1.6176 110 1.7054 - -
1.7647 120 1.6895 - -
1.9118 130 1.6724 - -
2.0588 140 1.6369 - -
2.2059 150 1.705 2.2941 0.8123
2.3529 160 1.8329 - -
2.5 170 1.6071 - -
2.6471 180 1.5157 - -
2.7941 190 1.624 - -
2.9412 200 1.6185 2.2668 0.8140
3.0882 210 1.6259 - -
3.2353 220 1.5749 - -
3.3824 230 1.5426 - -
3.5294 240 1.5522 - -
3.6765 250 1.5141 2.2498 0.8157
3.8235 260 1.5215 - -
3.9706 270 1.4983 - -
4.1176 280 1.4819 - -
4.2647 290 1.4552 - -
4.4118 300 1.5597 2.2226 0.8199
4.5588 310 1.3983 - -
4.7059 320 1.5386 - -
4.8529 330 1.4541 - -
5.0 340 1.4097 - -
5.1471 350 1.3741 2.2129 0.8207
5.2941 360 1.3909 - -
5.4412 370 1.4116 - -
5.5882 380 1.52 - -
5.7353 390 1.3644 - -
5.8824 400 1.3016 2.1699 0.8266
6.0294 410 1.4435 - -
6.1765 420 1.3112 - -
6.3235 430 1.4056 - -
6.4706 440 1.4541 - -
6.6176 450 1.3312 2.1486 0.8224
6.7647 460 1.2879 - -
6.9118 470 1.227 - -
7.0588 480 1.3834 - -
7.2059 490 1.3242 - -
7.3529 500 1.3756 2.1507 0.8274
7.5 510 1.2872 - -
7.6471 520 1.3288 - -
7.7941 530 1.2689 - -
7.9412 540 1.3102 - -
8.0882 550 1.2929 2.1355 0.8207
8.2353 560 1.2511 - -
8.3824 570 1.1849 - -
8.5294 580 1.2774 - -
8.6765 590 1.1923 - -
8.8235 600 1.1927 2.1111 0.8283
8.9706 610 1.2556 - -
9.1176 620 1.2767 - -
9.2647 630 1.1082 - -
9.4118 640 1.3077 - -
9.5588 650 1.1435 2.0922 0.8316
9.7059 660 1.1888 - -
9.8529 670 1.2123 - -
10.0 680 1.2554 - -

Framework Versions

  • Python: 3.8.18
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 1.13.1+cu117
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}