SentenceTransformer based on UBC-NLP/serengeti-E250

This is a sentence-transformers model finetuned from UBC-NLP/serengeti-E250 on the Mollel/swahili-n_li-triplet-swh-eng dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: UBC-NLP/serengeti-E250
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- Mollel/swahili-n_li-triplet-swh-eng

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ElectraModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mollel/MultiLinguSwahili-MultiLinguSwahili-serengeti-E250-nli-matryoshka-nli-matryoshka")
# Run inference
sentences = [
    'Mwanamume na mwanamke wachanga waliovaa mikoba wanaweka au kuondoa kitu kutoka kwenye mti mweupe wa zamani, huku watu wengine wamesimama au wameketi nyuma.',
    'mwanamume na mwanamke wenye mikoba',
    'tai huruka',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-test-768
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7084
spearman_cosine	0.7081
pearson_manhattan	0.7164
spearman_manhattan	0.7066
pearson_euclidean	0.7162
spearman_euclidean	0.7064
pearson_dot	0.3846
spearman_dot	0.3567
pearson_max	0.7164
spearman_max	0.7081

Semantic Similarity

Dataset: sts-test-512
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.706
spearman_cosine	0.7047
pearson_manhattan	0.7142
spearman_manhattan	0.7049
pearson_euclidean	0.715
spearman_euclidean	0.7055
pearson_dot	0.3855
spearman_dot	0.3586
pearson_max	0.715
spearman_max	0.7055

Semantic Similarity

Dataset: sts-test-256
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7069
spearman_cosine	0.7072
pearson_manhattan	0.7152
spearman_manhattan	0.7051
pearson_euclidean	0.7155
spearman_euclidean	0.7049
pearson_dot	0.3729
spearman_dot	0.3481
pearson_max	0.7155
spearman_max	0.7072

Semantic Similarity

Dataset: sts-test-128
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7023
spearman_cosine	0.7062
pearson_manhattan	0.7116
spearman_manhattan	0.7013
pearson_euclidean	0.7125
spearman_euclidean	0.7011
pearson_dot	0.3439
spearman_dot	0.3169
pearson_max	0.7125
spearman_max	0.7062

Semantic Similarity

Dataset: sts-test-64
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.695
spearman_cosine	0.6994
pearson_manhattan	0.706
spearman_manhattan	0.6939
pearson_euclidean	0.7066
spearman_euclidean	0.6949
pearson_dot	0.3098
spearman_dot	0.2855
pearson_max	0.7066
spearman_max	0.6994

Training Details

Training Dataset

Mollel/swahili-n_li-triplet-swh-eng

Dataset: Mollel/swahili-n_li-triplet-swh-eng
Size: 1,115,700 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 6 tokens mean: 11.27 tokens max: 48 tokens	min: 5 tokens mean: 13.0 tokens max: 29 tokens	min: 4 tokens mean: 12.56 tokens max: 29 tokens

Samples:

anchor	positive	negative
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`A person is at a diner, ordering an omelette.`
`Mtu aliyepanda farasi anaruka juu ya ndege iliyovunjika.`	`Mtu yuko nje, juu ya farasi.`	`Mtu yuko kwenye mkahawa, akiagiza omelette.`
`Children smiling and waving at camera`	`There are children present`	`The kids are frowning`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Evaluation Dataset

Mollel/swahili-n_li-triplet-swh-eng

Dataset: Mollel/swahili-n_li-triplet-swh-eng
Size: 13,168 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 5 tokens mean: 18.07 tokens max: 53 tokens	min: 4 tokens mean: 9.45 tokens max: 33 tokens	min: 4 tokens mean: 10.27 tokens max: 29 tokens

Samples:

anchor	positive	negative
`Two women are embracing while holding to go packages.`	`Two woman are holding packages.`	`The men are fighting outside a deli.`
`Wanawake wawili wanakumbatiana huku wakishikilia vifurushi vya kwenda.`	`Wanawake wawili wanashikilia vifurushi.`	`Wanaume hao wanapigana nje ya duka la vyakula vitamu.`
`Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.`	`Two kids in numbered jerseys wash their hands.`	`Two kids in jackets walk to school.`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	sts-test-128_spearman_cosine	sts-test-256_spearman_cosine	sts-test-512_spearman_cosine	sts-test-64_spearman_cosine	sts-test-768_spearman_cosine
0.0057	100	26.7003	-	-	-	-	-
0.0115	200	20.7097	-	-	-	-	-
0.0172	300	17.2266	-	-	-	-	-
0.0229	400	15.7511	-	-	-	-	-
0.0287	500	14.5329	-	-	-	-	-
0.0344	600	12.6534	-	-	-	-	-
0.0402	700	10.6758	-	-	-	-	-
0.0459	800	9.421	-	-	-	-	-
0.0516	900	9.5664	-	-	-	-	-
0.0574	1000	8.5166	-	-	-	-	-
0.0631	1100	8.657	-	-	-	-	-
0.0688	1200	8.5473	-	-	-	-	-
0.0746	1300	8.3018	-	-	-	-	-
0.0803	1400	8.4488	-	-	-	-	-
0.0860	1500	7.1796	-	-	-	-	-
0.0918	1600	6.6136	-	-	-	-	-
0.0975	1700	6.2638	-	-	-	-	-
0.1033	1800	6.6955	-	-	-	-	-
0.1090	1900	7.3585	-	-	-	-	-
0.1147	2000	6.9043	-	-	-	-	-
0.1205	2100	6.677	-	-	-	-	-
0.1262	2200	6.3914	-	-	-	-	-
0.1319	2300	6.0045	-	-	-	-	-
0.1377	2400	5.8048	-	-	-	-	-
0.1434	2500	5.6898	-	-	-	-	-
0.1491	2600	5.229	-	-	-	-	-
0.1549	2700	5.2407	-	-	-	-	-
0.1606	2800	5.7074	-	-	-	-	-
0.1664	2900	6.2917	-	-	-	-	-
0.1721	3000	6.5651	-	-	-	-	-
0.1778	3100	6.7751	-	-	-	-	-
0.1836	3200	6.195	-	-	-	-	-
0.1893	3300	5.4697	-	-	-	-	-
0.1950	3400	5.1362	-	-	-	-	-
0.2008	3500	5.581	-	-	-	-	-
0.2065	3600	5.4309	-	-	-	-	-
0.2122	3700	5.6688	-	-	-	-	-
0.2180	3800	5.6923	-	-	-	-	-
0.2237	3900	5.8598	-	-	-	-	-
0.2294	4000	5.3498	-	-	-	-	-
0.2352	4100	5.3797	-	-	-	-	-
0.2409	4200	5.0389	-	-	-	-	-
0.2467	4300	5.6622	-	-	-	-	-
0.2524	4400	5.6249	-	-	-	-	-
0.2581	4500	5.6927	-	-	-	-	-
0.2639	4600	5.3612	-	-	-	-	-
0.2696	4700	5.2751	-	-	-	-	-
0.2753	4800	5.4224	-	-	-	-	-
0.2811	4900	5.0338	-	-	-	-	-
0.2868	5000	4.9813	-	-	-	-	-
0.2925	5100	4.8533	-	-	-	-	-
0.2983	5200	5.4137	-	-	-	-	-
0.3040	5300	5.4063	-	-	-	-	-
0.3098	5400	5.3107	-	-	-	-	-
0.3155	5500	5.0907	-	-	-	-	-
0.3212	5600	4.8644	-	-	-	-	-
0.3270	5700	4.7926	-	-	-	-	-
0.3327	5800	5.0268	-	-	-	-	-
0.3384	5900	5.3029	-	-	-	-	-
0.3442	6000	5.1246	-	-	-	-	-
0.3499	6100	5.1152	-	-	-	-	-
0.3556	6200	5.4265	-	-	-	-	-
0.3614	6300	4.7079	-	-	-	-	-
0.3671	6400	4.6368	-	-	-	-	-
0.3729	6500	4.662	-	-	-	-	-
0.3786	6600	5.3695	-	-	-	-	-
0.3843	6700	4.6974	-	-	-	-	-
0.3901	6800	4.6584	-	-	-	-	-
0.3958	6900	4.7413	-	-	-	-	-
0.4015	7000	4.6604	-	-	-	-	-
0.4073	7100	5.2476	-	-	-	-	-
0.4130	7200	4.9966	-	-	-	-	-
0.4187	7300	4.656	-	-	-	-	-
0.4245	7400	4.5711	-	-	-	-	-
0.4302	7500	5.0256	-	-	-	-	-
0.4360	7600	4.3856	-	-	-	-	-
0.4417	7700	4.2548	-	-	-	-	-
0.4474	7800	4.8584	-	-	-	-	-
0.4532	7900	4.8563	-	-	-	-	-
0.4589	8000	4.5101	-	-	-	-	-
0.4646	8100	4.4688	-	-	-	-	-
0.4704	8200	4.7076	-	-	-	-	-
0.4761	8300	4.3268	-	-	-	-	-
0.4818	8400	4.6622	-	-	-	-	-
0.4876	8500	4.4808	-	-	-	-	-
0.4933	8600	4.676	-	-	-	-	-
0.4991	8700	5.0348	-	-	-	-	-
0.5048	8800	4.5497	-	-	-	-	-
0.5105	8900	4.7428	-	-	-	-	-
0.5163	9000	4.4418	-	-	-	-	-
0.5220	9100	4.4946	-	-	-	-	-
0.5277	9200	4.5249	-	-	-	-	-
0.5335	9300	4.2413	-	-	-	-	-
0.5392	9400	4.4799	-	-	-	-	-
0.5449	9500	4.6807	-	-	-	-	-
0.5507	9600	4.5901	-	-	-	-	-
0.5564	9700	4.7266	-	-	-	-	-
0.5622	9800	4.692	-	-	-	-	-
0.5679	9900	4.8651	-	-	-	-	-
0.5736	10000	4.7746	-	-	-	-	-
0.5794	10100	4.68	-	-	-	-	-
0.5851	10200	4.7697	-	-	-	-	-
0.5908	10300	4.8848	-	-	-	-	-
0.5966	10400	4.4004	-	-	-	-	-
0.6023	10500	4.2979	-	-	-	-	-
0.6080	10600	4.7266	-	-	-	-	-
0.6138	10700	4.8605	-	-	-	-	-
0.6195	10800	4.7436	-	-	-	-	-
0.6253	10900	4.6239	-	-	-	-	-
0.6310	11000	4.394	-	-	-	-	-
0.6367	11100	4.8081	-	-	-	-	-
0.6425	11200	4.2329	-	-	-	-	-
0.6482	11300	4.873	-	-	-	-	-
0.6539	11400	4.5557	-	-	-	-	-
0.6597	11500	4.7918	-	-	-	-	-
0.6654	11600	4.1607	-	-	-	-	-
0.6711	11700	4.8744	-	-	-	-	-
0.6769	11800	5.0072	-	-	-	-	-
0.6826	11900	4.3532	-	-	-	-	-
0.6883	12000	4.3319	-	-	-	-	-
0.6941	12100	4.6885	-	-	-	-	-
0.6998	12200	4.6682	-	-	-	-	-
0.7056	12300	4.4258	-	-	-	-	-
0.7113	12400	4.6136	-	-	-	-	-
0.7170	12500	4.3594	-	-	-	-	-
0.7228	12600	4.0627	-	-	-	-	-
0.7285	12700	4.5244	-	-	-	-	-
0.7342	12800	4.504	-	-	-	-	-
0.7400	12900	4.4694	-	-	-	-	-
0.7457	13000	4.4804	-	-	-	-	-
0.7514	13100	4.0588	-	-	-	-	-
0.7572	13200	4.8016	-	-	-	-	-
0.7629	13300	4.2971	-	-	-	-	-
0.7687	13400	4.1326	-	-	-	-	-
0.7744	13500	3.9763	-	-	-	-	-
0.7801	13600	3.7716	-	-	-	-	-
0.7859	13700	3.8448	-	-	-	-	-
0.7916	13800	3.6779	-	-	-	-	-
0.7973	13900	3.5938	-	-	-	-	-
0.8031	14000	3.3981	-	-	-	-	-
0.8088	14100	3.4151	-	-	-	-	-
0.8145	14200	3.2498	-	-	-	-	-
0.8203	14300	3.4909	-	-	-	-	-
0.8260	14400	3.4098	-	-	-	-	-
0.8318	14500	3.4448	-	-	-	-	-
0.8375	14600	3.2868	-	-	-	-	-
0.8432	14700	3.2196	-	-	-	-	-
0.8490	14800	3.0852	-	-	-	-	-
0.8547	14900	3.2341	-	-	-	-	-
0.8604	15000	3.164	-	-	-	-	-
0.8662	15100	3.0919	-	-	-	-	-
0.8719	15200	3.176	-	-	-	-	-
0.8776	15300	3.1361	-	-	-	-	-
0.8834	15400	3.0683	-	-	-	-	-
0.8891	15500	3.0275	-	-	-	-	-
0.8949	15600	3.0763	-	-	-	-	-
0.9006	15700	3.1828	-	-	-	-	-
0.9063	15800	3.0053	-	-	-	-	-
0.9121	15900	2.9696	-	-	-	-	-
0.9178	16000	2.8919	-	-	-	-	-
0.9235	16100	2.9922	-	-	-	-	-
0.9293	16200	2.9063	-	-	-	-	-
0.9350	16300	3.0633	-	-	-	-	-
0.9407	16400	3.1782	-	-	-	-	-
0.9465	16500	2.9206	-	-	-	-	-
0.9522	16600	2.8785	-	-	-	-	-
0.9580	16700	2.9934	-	-	-	-	-
0.9637	16800	3.0125	-	-	-	-	-
0.9694	16900	2.9338	-	-	-	-	-
0.9752	17000	2.9931	-	-	-	-	-
0.9809	17100	2.956	-	-	-	-	-
0.9866	17200	2.8415	-	-	-	-	-
0.9924	17300	3.0072	-	-	-	-	-
0.9981	17400	2.9046	-	-	-	-	-
1.0	17433	-	0.7062	0.7072	0.7047	0.6994	0.7081

Framework Versions

Python: 3.11.9
Sentence Transformers: 3.0.1
Transformers: 4.40.1
PyTorch: 2.3.0+cu121
Accelerate: 0.29.3
Datasets: 2.19.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Mollel
/

MultiLinguSwahili-serengeti-E250-nli-matryoshka

SentenceTransformer based on UBC-NLP/serengeti-E250

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Semantic Similarity

Semantic Similarity

Semantic Similarity

Semantic Similarity

Training Details

Training Dataset

Mollel/swahili-n_li-triplet-swh-eng

Evaluation Dataset

Mollel/swahili-n_li-triplet-swh-eng

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MatryoshkaLoss

MultipleNegativesRankingLoss

Model tree for Mollel/MultiLinguSwahili-serengeti-E250-nli-matryoshka

Evaluation results