SentenceTransformer based on cointegrated/LaBSE-en-ru

This is a sentence-transformers model finetuned from cointegrated/LaBSE-en-ru. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: cointegrated/LaBSE-en-ru
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("whitemouse84/LaBSE-en-ru-distilled-each-third-layer")
# Run inference
sentences = [
    'See Name section.',
    'Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.',
    'Yeah, people who might not be hungry.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.5305
spearman_cosine	0.6347
pearson_manhattan	0.5553
spearman_manhattan	0.6389
pearson_euclidean	0.55
spearman_euclidean	0.6347
pearson_dot	0.5305
spearman_dot	0.6347
pearson_max	0.5553
spearman_max	0.6389

Knowledge Distillation

Evaluated with MSEEvaluator

Metric	Value
negative_mse	-0.0063

Semantic Similarity

Dataset: sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.5043
spearman_cosine	0.5986
pearson_manhattan	0.5227
spearman_manhattan	0.5984
pearson_euclidean	0.5227
spearman_euclidean	0.5986
pearson_dot	0.5043
spearman_dot	0.5986
pearson_max	0.5227
spearman_max	0.5986

Training Details

Training Dataset

Unnamed Dataset

Size: 10,975,066 training samples
Columns: sentence and label
Approximate statistics based on the first 1000 samples:
sentence label
type string list
details
min: 6 tokens
mean: 26.93 tokens
max: 139 tokens

size: 768 elements

	sentence	label
type	string	list
details	min: 6 tokens mean: 26.93 tokens max: 139 tokens	size: 768 elements

Samples:

sentence	label
`It is based on the Java Persistence API (JPA), but it does not strictly follow the JSR 338 Specification, as it implements different design patterns and technologies.`	`[-0.012331949546933174, -0.04570527374744415, -0.024963658303022385, -0.03620213270187378, 0.022556383162736893, ...]`
`Покупаем вторичное сырье в Каунасе (Переработка вторичного сырья) - Алфенас АНД КО, ЗАО на Bizorg.`	`[-0.07498518377542496, -0.01913534104824066, -0.01797042042016983, 0.048263177275657654, -0.00016611881437711418, ...]`
`At the Equal Justice Conference ( EJC ) held in March 2001 in San Diego , LSC and the Project for the Future of Equal Justice held the second Case Management Software pre-conference .`	`[0.03870972990989685, -0.0638347640633583, -0.01696585863828659, -0.043612319976091385, -0.048241738229990005, ...]`

Loss: MSELoss

Evaluation Dataset

Unnamed Dataset

Size: 10,000 evaluation samples
Columns: sentence and label
Approximate statistics based on the first 1000 samples:
sentence label
type string list
details
min: 5 tokens
mean: 24.18 tokens
max: 111 tokens

size: 768 elements

	sentence	label
type	string	list
details	min: 5 tokens mean: 24.18 tokens max: 111 tokens	size: 768 elements

Samples:

sentence	label
`The Canadian Canoe Museum is a museum dedicated to canoes located in Peterborough, Ontario, Canada.`	`[-0.05444105342030525, -0.03650881350040436, -0.041163671761751175, -0.010616903193295002, -0.04094529151916504, ...]`
`И мне нравилось, что я одновременно зарабатываю и смотрю бои».`	`[-0.03404555842280388, 0.028203096240758896, -0.056121889501810074, -0.0591997392475605, -0.05523117259144783, ...]`
`Ну, а на следующий день, разумеется, Президент Кеннеди объявил блокаду Кубы, и наши корабли остановили у кубинских берегов направлявшийся на Кубу российский корабль, и у него на борту нашли ракеты.`	`[-0.008193841204047203, 0.00694894278421998, -0.03027420863509178, -0.03290146216750145, 0.01425305474549532, ...]`

Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 0.0001
num_train_epochs: 1
warmup_ratio: 0.1
fp16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0001
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss	negative_mse	sts-dev_spearman_cosine	sts-test_spearman_cosine
0	0	-	-	-0.2381	0.4206	-
0.0058	1000	0.0014	-	-	-	-
0.0117	2000	0.0009	-	-	-	-
0.0175	3000	0.0007	-	-	-	-
0.0233	4000	0.0006	-	-	-	-
0.0292	5000	0.0005	0.0004	-0.0363	0.6393	-
0.0350	6000	0.0004	-	-	-	-
0.0408	7000	0.0004	-	-	-	-
0.0467	8000	0.0003	-	-	-	-
0.0525	9000	0.0003	-	-	-	-
0.0583	10000	0.0003	0.0002	-0.0207	0.6350	-
0.0641	11000	0.0003	-	-	-	-
0.0700	12000	0.0003	-	-	-	-
0.0758	13000	0.0002	-	-	-	-
0.0816	14000	0.0002	-	-	-	-
0.0875	15000	0.0002	0.0002	-0.0157	0.6328	-
0.0933	16000	0.0002	-	-	-	-
0.0991	17000	0.0002	-	-	-	-
0.1050	18000	0.0002	-	-	-	-
0.1108	19000	0.0002	-	-	-	-
0.1166	20000	0.0002	0.0001	-0.0132	0.6317	-
0.1225	21000	0.0002	-	-	-	-
0.1283	22000	0.0002	-	-	-	-
0.1341	23000	0.0002	-	-	-	-
0.1400	24000	0.0002	-	-	-	-
0.1458	25000	0.0002	0.0001	-0.0118	0.6251	-
0.1516	26000	0.0002	-	-	-	-
0.1574	27000	0.0002	-	-	-	-
0.1633	28000	0.0002	-	-	-	-
0.1691	29000	0.0002	-	-	-	-
0.1749	30000	0.0002	0.0001	-0.0109	0.6304	-
0.1808	31000	0.0002	-	-	-	-
0.1866	32000	0.0002	-	-	-	-
0.1924	33000	0.0002	-	-	-	-
0.1983	34000	0.0001	-	-	-	-
0.2041	35000	0.0001	0.0001	-0.0102	0.6280	-
0.2099	36000	0.0001	-	-	-	-
0.2158	37000	0.0001	-	-	-	-
0.2216	38000	0.0001	-	-	-	-
0.2274	39000	0.0001	-	-	-	-
0.2333	40000	0.0001	0.0001	-0.0098	0.6272	-
0.2391	41000	0.0001	-	-	-	-
0.2449	42000	0.0001	-	-	-	-
0.2507	43000	0.0001	-	-	-	-
0.2566	44000	0.0001	-	-	-	-
0.2624	45000	0.0001	0.0001	-0.0093	0.6378	-
0.2682	46000	0.0001	-	-	-	-
0.2741	47000	0.0001	-	-	-	-
0.2799	48000	0.0001	-	-	-	-
0.2857	49000	0.0001	-	-	-	-
0.2916	50000	0.0001	0.0001	-0.0089	0.6325	-
0.2974	51000	0.0001	-	-	-	-
0.3032	52000	0.0001	-	-	-	-
0.3091	53000	0.0001	-	-	-	-
0.3149	54000	0.0001	-	-	-	-
0.3207	55000	0.0001	0.0001	-0.0087	0.6328	-
0.3266	56000	0.0001	-	-	-	-
0.3324	57000	0.0001	-	-	-	-
0.3382	58000	0.0001	-	-	-	-
0.3441	59000	0.0001	-	-	-	-
0.3499	60000	0.0001	0.0001	-0.0085	0.6357	-
0.3557	61000	0.0001	-	-	-	-
0.3615	62000	0.0001	-	-	-	-
0.3674	63000	0.0001	-	-	-	-
0.3732	64000	0.0001	-	-	-	-
0.3790	65000	0.0001	0.0001	-0.0083	0.6366	-
0.3849	66000	0.0001	-	-	-	-
0.3907	67000	0.0001	-	-	-	-
0.3965	68000	0.0001	-	-	-	-
0.4024	69000	0.0001	-	-	-	-
0.4082	70000	0.0001	0.0001	-0.0080	0.6325	-
0.4140	71000	0.0001	-	-	-	-
0.4199	72000	0.0001	-	-	-	-
0.4257	73000	0.0001	-	-	-	-
0.4315	74000	0.0001	-	-	-	-
0.4374	75000	0.0001	0.0001	-0.0078	0.6351	-
0.4432	76000	0.0001	-	-	-	-
0.4490	77000	0.0001	-	-	-	-
0.4548	78000	0.0001	-	-	-	-
0.4607	79000	0.0001	-	-	-	-
0.4665	80000	0.0001	0.0001	-0.0077	0.6323	-
0.4723	81000	0.0001	-	-	-	-
0.4782	82000	0.0001	-	-	-	-
0.4840	83000	0.0001	-	-	-	-
0.4898	84000	0.0001	-	-	-	-
0.4957	85000	0.0001	0.0001	-0.0076	0.6316	-
0.5015	86000	0.0001	-	-	-	-
0.5073	87000	0.0001	-	-	-	-
0.5132	88000	0.0001	-	-	-	-
0.5190	89000	0.0001	-	-	-	-
0.5248	90000	0.0001	0.0001	-0.0074	0.6306	-
0.5307	91000	0.0001	-	-	-	-
0.5365	92000	0.0001	-	-	-	-
0.5423	93000	0.0001	-	-	-	-
0.5481	94000	0.0001	-	-	-	-
0.5540	95000	0.0001	0.0001	-0.0073	0.6305	-
0.5598	96000	0.0001	-	-	-	-
0.5656	97000	0.0001	-	-	-	-
0.5715	98000	0.0001	-	-	-	-
0.5773	99000	0.0001	-	-	-	-
0.5831	100000	0.0001	0.0001	-0.0072	0.6333	-
0.5890	101000	0.0001	-	-	-	-
0.5948	102000	0.0001	-	-	-	-
0.6006	103000	0.0001	-	-	-	-
0.6065	104000	0.0001	-	-	-	-
0.6123	105000	0.0001	0.0001	-0.0071	0.6351	-
0.6181	106000	0.0001	-	-	-	-
0.6240	107000	0.0001	-	-	-	-
0.6298	108000	0.0001	-	-	-	-
0.6356	109000	0.0001	-	-	-	-
0.6415	110000	0.0001	0.0001	-0.0070	0.6330	-
0.6473	111000	0.0001	-	-	-	-
0.6531	112000	0.0001	-	-	-	-
0.6589	113000	0.0001	-	-	-	-
0.6648	114000	0.0001	-	-	-	-
0.6706	115000	0.0001	0.0001	-0.0070	0.6336	-
0.6764	116000	0.0001	-	-	-	-
0.6823	117000	0.0001	-	-	-	-
0.6881	118000	0.0001	-	-	-	-
0.6939	119000	0.0001	-	-	-	-
0.6998	120000	0.0001	0.0001	-0.0069	0.6305	-
0.7056	121000	0.0001	-	-	-	-
0.7114	122000	0.0001	-	-	-	-
0.7173	123000	0.0001	-	-	-	-
0.7231	124000	0.0001	-	-	-	-
0.7289	125000	0.0001	0.0001	-0.0068	0.6362	-
0.7348	126000	0.0001	-	-	-	-
0.7406	127000	0.0001	-	-	-	-
0.7464	128000	0.0001	-	-	-	-
0.7522	129000	0.0001	-	-	-	-
0.7581	130000	0.0001	0.0001	-0.0067	0.6340	-
0.7639	131000	0.0001	-	-	-	-
0.7697	132000	0.0001	-	-	-	-
0.7756	133000	0.0001	-	-	-	-
0.7814	134000	0.0001	-	-	-	-
0.7872	135000	0.0001	0.0001	-0.0067	0.6365	-
0.7931	136000	0.0001	-	-	-	-
0.7989	137000	0.0001	-	-	-	-
0.8047	138000	0.0001	-	-	-	-
0.8106	139000	0.0001	-	-	-	-
0.8164	140000	0.0001	0.0001	-0.0066	0.6339	-
0.8222	141000	0.0001	-	-	-	-
0.8281	142000	0.0001	-	-	-	-
0.8339	143000	0.0001	-	-	-	-
0.8397	144000	0.0001	-	-	-	-
0.8456	145000	0.0001	0.0001	-0.0066	0.6352	-
0.8514	146000	0.0001	-	-	-	-
0.8572	147000	0.0001	-	-	-	-
0.8630	148000	0.0001	-	-	-	-
0.8689	149000	0.0001	-	-	-	-
0.8747	150000	0.0001	0.0001	-0.0065	0.6357	-
0.8805	151000	0.0001	-	-	-	-
0.8864	152000	0.0001	-	-	-	-
0.8922	153000	0.0001	-	-	-	-
0.8980	154000	0.0001	-	-	-	-
0.9039	155000	0.0001	0.0001	-0.0065	0.6336	-
0.9097	156000	0.0001	-	-	-	-
0.9155	157000	0.0001	-	-	-	-
0.9214	158000	0.0001	-	-	-	-
0.9272	159000	0.0001	-	-	-	-
0.9330	160000	0.0001	0.0001	-0.0064	0.6334	-
0.9389	161000	0.0001	-	-	-	-
0.9447	162000	0.0001	-	-	-	-
0.9505	163000	0.0001	-	-	-	-
0.9563	164000	0.0001	-	-	-	-
0.9622	165000	0.0001	0.0001	-0.0064	0.6337	-
0.9680	166000	0.0001	-	-	-	-
0.9738	167000	0.0001	-	-	-	-
0.9797	168000	0.0001	-	-	-	-
0.9855	169000	0.0001	-	-	-	-
0.9913	170000	0.0001	0.0001	-0.0063	0.6347	-
0.9972	171000	0.0001	-	-	-	-
1.0	171486	-	-	-	-	0.5986

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.14
Sentence Transformers: 3.0.1
Transformers: 4.44.0
PyTorch: 2.4.0
Accelerate: 0.33.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

whitemouse84
/

LaBSE-en-ru-distilled-each-third-layer

SentenceTransformer based on cointegrated/LaBSE-en-ru

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Knowledge Distillation

Semantic Similarity

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MSELoss

Model tree for whitemouse84/LaBSE-en-ru-distilled-each-third-layer

Evaluation results