metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:21988
- loss:MultipleNegativesRankingLoss
base_model: Lajavaness/bilingual-embedding-large
widget:
- source_sentence: >-
BEAS INTESTINES 2901 718935 wwwIsrael under heavy attack from Gaza There
were more than 600 rockets launched against Israel. There are some
civilians wounded and dead
sentences:
- Photo shows cloud of smoke after attack in Israel
- Claudia López with a book thanking the FARC
- Wife of Chinese official shot in US
- source_sentence: >-
People's Network people.cn People's Daily: Scientifically grasp the law of
population development Balanced Population Development in the New Era -
January 2022 From the 1st, the one-child policy will be completely
abolished. Newlyweds must have at least two children Wang Peian April 1,
2021 06:18 Source: People's Daily Online, People's Daily Executive
summary: ■After the founding of New China, the implementation of family
planning was based on the basic national conditions of my country's large
population and relatively insufficient resources A major strategic
decision, which makes the population's pressure on resources and the
environment get a preliminary understanding: it creates a longer
demographic dividend period, It has effectively promoted economic
development, social progress and the improvement of people's living
standards, and the country's capacity for sustainable development has been
greatly enhanced. ■Since the beginning of the new century, my country's
population situation has undergone major changes. Strive to achieve the
level of active fertility, vigorously improve the quality and skills of
workers, and implement the comprehensive two-child policy, which is the
key to population development. Three issues that must be addressed in the
field. ■ Attention should be paid to the research on population
development strategies, comprehensively and profoundly understand and
grasp the laws of population, and promote the coordination between
population and economy and society. development, and promote the long-term
balanced development of the population. choice of history my country has
been a country with the largest population in the world since ancient
times. In traditional society, if there is an entrance, there will be a
license and tax, and the country will be strengthened. If there is a
population, there will be soldiers. The rulers of successive dynasties
have vigorously encouraged population reproduction. Once the society is
stable and production develops, the total population will decrease. The
threshold will increase greatly; when the dynasty is changed, the army
will be in chaos, famine and flag epidemics will be intertwined, and the
population will be sharp or small. Look, before the 17th century, my
country's population grew slowly in a cyclical ups and downs. The
introduction of high-yielding food crops such as corn, sweet potato and
potato in the late Ming Dynasty, especially the century-long Kanggan in
the early Qing Dynasty. The prosperous age made my country's population
grow rapidly, breaking through the 200 million, 300 million mark
successively, and the 400 million mark in the Daoguang years, which led
to Legal Migrant Workers People's Network people.cn People's Daily:
Scientifically grasp the law of population development Balanced Population
Development in the New Era - January 2022 From the 1st, the one-child
policy will be completely abolished. Newlyweds must have at least two
children Wang Peian April 1, 2021 06:18 Source: People's Daily Online,
People's Daily Executive summary: ■After the founding of New China, the
implementation of family planning was based on the basic national
conditions of my country's large population and relatively insufficient
resources A major strategic decision, which makes the population's
pressure on resources and the environment get a preliminary understanding:
it creates a longer demographic dividend period, It has effectively
promoted economic development, social progress and the improvement of
people's living standards, and the country's capacity for sustainable
development has been greatly enhanced. ■Since the beginning of the new
century, my country's population situation has undergone major changes.
Strive to achieve the level of active fertility, vigorously improve the
quality and skills of workers, and implement the comprehensive two-child
policy, which is the key to population development. Three issues that must
be addressed in the field. ■ Attention should be paid to the research on
population development strategies, comprehensively and profoundly
understand and grasp the laws of population, and promote the coordination
between population and economy and society. development, and promote the
long-term balanced development of the population. choice of history my
country has been a country with the largest population in the world since
ancient times. In traditional society, if there is an entrance, there will
be a license and tax, and the country will be strengthened. If there is a
population, there will be soldiers. The rulers of successive dynasties
have vigorously encouraged population reproduction. Once the society is
stable and production develops, the total population will decrease. The
threshold will increase greatly; when the dynasty is changed, the army
will be in chaos, famine and flag epidemics will be intertwined, and the
population will be sharp or small. Look, before the 17th century, my
country's population grew slowly in a cyclical ups and downs. The
introduction of high-yielding food crops such as corn, sweet potato and
potato in the late Ming Dynasty, especially the century-long Kanggan in
the early Qing Dynasty. The prosperous age made my country's population
grow rapidly, breaking through the 200 million, 300 million mark
successively, and the 400 million mark in the Daoguang years, which led
to Legal Migrant WorkersA warning to those prosperous forces who often
talk about human rights: China has human rights, and we have approved that
Chinese people must get married, and they must have two children after
they get married!
sentences:
- >-
Hamad bin Jassim told the BBC In a new interview, we paid the defected
Syrian officer $30,000 and the regular soldier $15,000.
- >-
State-run newspaper announces Chinese couples ‘must have two children’
starting January 2022
- >-
This is the draw for judges for the case of former Ecuadorian President
Rafael Correa
- source_sentence: >-
Part 1 Resignation sir jokowi JOKOWI REGISTERED COMPASS DKI DPRD HOLDS
Plenary MEETING CARIS JAKARTA KOMPASTV Tik TokIs it true that the
President of Indonesia, Joko Widodo, has resigned from his position?
sentences:
- BBC reports on release of 'Unabomber' Ted Kaczynski
- Thai children flash three fingered salute to Thai PM Prayut
- President Joko Widodo, alias Jokowi, resigns from his post
- source_sentence: >-
The organization 'Vegan Society' calls for a ban on animal-shaped
children's cookies. They consider that these cookies "incite children to
see animals as something inferior and at our disposal." This is the ,
which is dangerous even for anti-bullfighting. It's not that they don't
want bullfighting. It is that they want to impose even the shape of the
cookies that your children eat. And it's not the first time. Barnum
cookies have already "freed" the animals in their boxes to have a better
brand image. They may seem like funny news. But they are not. They hide a
prohibitionist ideology full of censorship. 𝗘𝗹 𝗮𝗻𝗶𝗺𝗮𝗹𝗶𝘀𝗺𝗼 𝗲𝘀
𝗽𝗲𝗹𝗶𝗴𝗿𝗼 𝗽𝗮𝗿𝗮 𝗻𝘂𝗲𝘀𝘁𝗿𝗮 𝘀𝗼𝗰𝗶𝗲𝗱𝗮𝗱
sentences:
- >-
Vegan NGO Vegan Society wants to ban the sale of animal-shaped cookies
in France
- Cans of food containing pork with a "halal" stamp
- >-
Pfizer announces Covid-19 vaccine update with Microsoft chip for symptom
reduction
- source_sentence: >-
a . . . . . (177. FO Accident st THE LEADER IN ACCIDENT REPORTING Reckless
driving by a minor Kuliapitiya Kanadulla after a defender collided with a
motorcycle An accident occurred in front of Maha Vidyalaya today (01)
afternoon A young man on a motorcycle and about 4 years old A young child
(father and son) unfortunately Lost his life. Behaved provocatively with
the accident Villagers set fire to the defender car that caused the
accident had May that innocent father and little son rest in peace! 94
site
sentences:
- The image of a Syrian child who sleeps next to the graves of his parents
- Accident kills four-year-old in northwestern Sri Lanka
- Masks are ineffective because some packaging says they don't protect
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on Lajavaness/bilingual-embedding-large
This is a sentence-transformers model finetuned from Lajavaness/bilingual-embedding-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Lajavaness/bilingual-embedding-large
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'a . . . . . (177. FO Accident st THE LEADER IN ACCIDENT REPORTING Reckless driving by a minor Kuliapitiya Kanadulla after a defender collided with a motorcycle An accident occurred in front of Maha Vidyalaya today (01) afternoon A young man on a motorcycle and about 4 years old A young child (father and son) unfortunately Lost his life. Behaved provocatively with the accident Villagers set fire to the defender car that caused the accident had May that innocent father and little son rest in peace! 94 site',
'Accident kills four-year-old in northwestern Sri Lanka',
'The image of a Syrian child who sleeps next to the graves of his parents',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 21,988 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 2 tokens
- mean: 119.9 tokens
- max: 512 tokens
- min: 7 tokens
- mean: 19.25 tokens
- max: 128 tokens
- Samples:
sentence_0 sentence_1 ANK DBS DBS IT department at ChangiThis is actually happening as confirmed by my brother who does contract work with DBS at Changi Business Park. Wonder if PAP knows this or turning a blind eye and pretending not to know.
Photo shows foreign staff of the IT department at DBS Bank in Singapore
29th 30th 31st 32nd 33rd 34th 35th 36th 37th 38th 39th 40th 41st 42nd 43rd 44th 45th 46th 47th 48th 49th 50th 51st 52nd 53rd 54th 55th Urban Planning Foreign Languages Animal Science Law Economics Political Science Education Advertising Journalism Finance Hospitality Criminology Accounting Anthropology Psychology History Geography Information Technology Sociology Sports Science Social Sciences Real Estate Liberal Arts Communications and Mass Media Business Marketing Public Relations 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st 22nd 23rd 24th 25th 26th 27th 28th Architecture Chemical Engineering Chemistry Electrical Engineering Physics Mechanical Engineering Civil Engineering Biochemistry Medicine Pharmacy Engineering Nursing Math Biology Philosophy Mathematics Statistics Music Microbiology Psychology Accounting Finance Environmental Science Creative Writing Hospitality International Relations Art History Ecology55 most difficult course...
Harvard list of its 50 most difficult courses
The 30,000 sheep donated by Mongolia to China entered through the Erenhot port, which is very spectacular. [Qiang] Yesterday there were people who were worried about how to transport so many sheep. It turned out that they came by themselves, and they didn't even need transport tools.
These videos show 30,000 sheep donated to China by Mongolia during the novel coronavirus epidemic
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 2per_device_eval_batch_size
: 2num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 2per_device_eval_batch_size
: 2per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.0455 | 500 | 0.0505 |
0.0910 | 1000 | 0.0637 |
0.1364 | 1500 | 0.039 |
0.1819 | 2000 | 0.0269 |
0.2274 | 2500 | 0.0527 |
0.2729 | 3000 | 0.0576 |
0.3184 | 3500 | 0.0278 |
0.3638 | 4000 | 0.0471 |
0.4093 | 4500 | 0.0486 |
0.4548 | 5000 | 0.025 |
0.5003 | 5500 | 0.0324 |
0.5458 | 6000 | 0.0169 |
0.5912 | 6500 | 0.0218 |
0.6367 | 7000 | 0.0476 |
0.6822 | 7500 | 0.0124 |
0.7277 | 8000 | 0.0247 |
0.7731 | 8500 | 0.0231 |
0.8186 | 9000 | 0.01 |
0.8641 | 9500 | 0.0145 |
0.9096 | 10000 | 0.0267 |
0.9551 | 10500 | 0.0111 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}