SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

bert-base-uncased was pretrained on a large corpus of open access philosophy text.
This model was further trained using TSDAE on a subset of sentences from this corpus for 6 epochs.
Resulting model was finetuned using cosine similarity objective on the "philsim" private dataset.
Resulting model was finetuned using cosine similarity objective on the beatai-philosophy dataset.

Model internal name: pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-20e

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/philai-embeddings-2.0")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: beatai-dev
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.8081
dot_accuracy	0.2811
manhattan_accuracy	0.8316
euclidean_accuracy	0.8249
max_accuracy	0.8316

Training Details

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 138
per_device_eval_batch_size: 138
learning_rate: 2e-06
num_train_epochs: 10
lr_scheduler_type: constant
bf16: True
dataloader_drop_last: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 138
per_device_eval_batch_size: 138
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-06
weight_decay: 0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: constant
lr_scheduler_kwargs: {}
warmup_ratio: 0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 0
dataloader_prefetch_factor: 2
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss	beatai-dev_max_accuracy
0	0	-	-	0.8072
0.1471	10	1.8573	-	-
0.2941	20	1.8196	-	-
0.4412	30	1.8594	-	-
0.5882	40	1.8581	-	-
0.7353	50	1.8766	2.3603	0.8047
0.8824	60	1.8596	-	-
1.0294	70	1.6816	-	-
1.1765	80	1.7564	-	-
1.3235	90	1.7191	-	-
1.4706	100	1.6521	2.3296	0.8064
1.6176	110	1.7054	-	-
1.7647	120	1.6895	-	-
1.9118	130	1.6724	-	-
2.0588	140	1.6369	-	-
2.2059	150	1.705	2.2941	0.8123
2.3529	160	1.8329	-	-
2.5	170	1.6071	-	-
2.6471	180	1.5157	-	-
2.7941	190	1.624	-	-
2.9412	200	1.6185	2.2668	0.8140
3.0882	210	1.6259	-	-
3.2353	220	1.5749	-	-
3.3824	230	1.5426	-	-
3.5294	240	1.5522	-	-
3.6765	250	1.5141	2.2498	0.8157
3.8235	260	1.5215	-	-
3.9706	270	1.4983	-	-
4.1176	280	1.4819	-	-
4.2647	290	1.4552	-	-
4.4118	300	1.5597	2.2226	0.8199
4.5588	310	1.3983	-	-
4.7059	320	1.5386	-	-
4.8529	330	1.4541	-	-
5.0	340	1.4097	-	-
5.1471	350	1.3741	2.2129	0.8207
5.2941	360	1.3909	-	-
5.4412	370	1.4116	-	-
5.5882	380	1.52	-	-
5.7353	390	1.3644	-	-
5.8824	400	1.3016	2.1699	0.8266
6.0294	410	1.4435	-	-
6.1765	420	1.3112	-	-
6.3235	430	1.4056	-	-
6.4706	440	1.4541	-	-
6.6176	450	1.3312	2.1486	0.8224
6.7647	460	1.2879	-	-
6.9118	470	1.227	-	-
7.0588	480	1.3834	-	-
7.2059	490	1.3242	-	-
7.3529	500	1.3756	2.1507	0.8274
7.5	510	1.2872	-	-
7.6471	520	1.3288	-	-
7.7941	530	1.2689	-	-
7.9412	540	1.3102	-	-
8.0882	550	1.2929	2.1355	0.8207
8.2353	560	1.2511	-	-
8.3824	570	1.1849	-	-
8.5294	580	1.2774	-	-
8.6765	590	1.1923	-	-
8.8235	600	1.1927	2.1111	0.8283
8.9706	610	1.2556	-	-
9.1176	620	1.2767	-	-
9.2647	630	1.1082	-	-
9.4118	640	1.3077	-	-
9.5588	650	1.1435	2.0922	0.8316
9.7059	660	1.1888	-	-
9.8529	670	1.2123	-	-
10.0	680	1.2554	-	-

Framework Versions

Python: 3.8.18
Sentence Transformers: 3.1.1
Transformers: 4.44.2
PyTorch: 1.13.1+cu117
Accelerate: 0.34.2
Datasets: 3.0.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

dbourget
/

philai-embeddings-2.0