metadata
base_model: BAAI/bge-large-en
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:626
- loss:CosineSimilarityLoss
widget:
- source_sentence: What determines the completion of performance of the contract?
sentences:
- >-
In a tender/contract, in case of any difference, contradiction,
discrepancy, with regard to conditions of tender/contract,
specifications, drawings, bill of quantities etc.
- >-
The Contractor shall at all times during the progress and continuance of
the works and also for the period of maintenance specified in the Tender
Form
- What determines the completion of performance of the contract?
- source_sentence: Early completion bonus
sentences:
- In case of ambiguity, order of precedence shall be referred.
- >-
Contractor shall be entitled for a bonus of 1% for each 30 days early
completion of work.
- "The Railway shall have the right to let other contracts in connection with the works. The Contractor shall afford other Contractors reasonable opportunity for the storage of their materials and the execution of their works and shall properly connect and coordinate his work with theirs. If any part of the Contractor\x92s work depends upon proper execution or result upon the work of another Contractor(s), the Contractor shall inspect and promptly report to the Engineer any defects in such works that render it unsuitable for such proper execution and results. The Contractor's failure so-to inspect and report shall constitute an acceptance of the other Contractor's work as fit and proper for the reception of his work, except as to defects which may develop in the other Contractor's work after the execution of his work."
- source_sentence: Out of scope works
sentences:
- >-
as to execution or quality of any work or material, or as to the
measurements of the works the decision of the Engineer thereon shall be
final subject to the appeal (within 7 days of such decision being
intimated to the Contractor) to the Chief Engineer
- >-
Should works over and above those included in the contract require to be
executed at the site, the Contractor shall have no right to be entrusted
with the execution of such works which may be carried out by another
Contractor or Contractors or by other means at the option of the
Railway.
- >-
What is the order of precedence in the case of ambiguity between
drawings and technical specifications?
- source_sentence: Deadline
sentences:
- >-
shall be read in conjunction with the Standard General Conditions of
Contract which are referred to herein and shall be subject to
modifications additions or suppression by Special Conditions of Contract
and/or Special Specifications, if any, annexed to the Tender Forms.
- the sand, stone, clay ballast, earth, trees, rock
- not later than 30 days after the date of receipt
- source_sentence: >-
Can the stones/rocks/bounders obtained during excavation be used for
construction if found technically satisfactory?
sentences:
- >-
use the same for the purpose of the works either free of cost or pay the
cost
- Any material found during excavation should be reported to the engineer.
- >-
No certificate other than Maintenance Certificate, if applicable,
referred to in Clause 50 of the Conditions shall be deemed to constitute
approval
SentenceTransformer based on BAAI/bge-large-en
This is a sentence-transformers model finetuned from BAAI/bge-large-en. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-large-en
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Ananthu357/Ananthus-BAAI-for-contracts10.0")
# Run inference
sentences = [
'Can the stones/rocks/bounders obtained during excavation be used for construction if found technically satisfactory?',
'use the same for the purpose of the works either free of cost or pay the cost',
'No certificate other than Maintenance Certificate, if applicable, referred to in Clause 50 of the Conditions shall be deemed to constitute approval',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 15warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 15max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss |
---|---|---|---|
2.5 | 100 | 0.0568 | 0.1144 |
5.0 | 200 | 0.0099 | 0.0947 |
7.5 | 300 | 0.0039 | 0.1039 |
10.0 | 400 | 0.0021 | 0.1027 |
12.5 | 500 | 0.0014 | 0.1017 |
15.0 | 600 | 0.0012 | 0.1019 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}