sts_qna_model / README.md
msamg's picture
Add new SentenceTransformer model.
e834a7c verified
metadata
base_model: sentence-transformers/all-MiniLM-L6-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:2860
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: >-
      No, it is not true. The sex chromosomes of the father determine the sex of
      an unborn baby, not the mother.
    sentences:
      - >-
        The wall of the uterus expands outward like a balloon during ovum
        maturation.
      - >-
        The mother's emotional state during pregnancy can influence the sex of
        the baby, making her solely responsible for determining it.
      - Six
  - source_sentence: Answer not found in response.
    sentences:
      - nan
      - >-
        In living organisms, cells are likened to bricks in a building due to
        their role as structural components.
      - >-
        Plant cells exclusively house chloroplasts as they play a crucial role
        in converting sunlight into energy for plants through the process of
        photosynthesis. These specialized organelles possess chlorophyll, a
        green pigment essential for absorbing light energy.
  - source_sentence: >-
      The organelles found in the cytoplasm of a cell include mitochondria,
      golgi bodies, ribosomes, and other components.
    sentences:
      - >-
        Examples of diseases that vaccines offer protection from are cholera,
        tuberculosis, smallpox, and hepatitis.
      - >-
        Having a balanced diet helps regulate the levels of fairy dust in the
        body, which indirectly impacts reproductive health.
      - >-
        Mitochondria, golgi bodies, ribosomes, and various other structures are
        present in the cytoplasm of a cell.
  - source_sentence: >-
      The basic practices of crop production include preparation of soil,
      sowing, adding manure and fertilizers, irrigation, protecting from weeds,
      harvesting, and storage.
    sentences:
      - You can see miniature plants growing inside the water droplet.
      - >-
        Changes in their natural surroundings, such as deforestation and
        desertification, cause migratory birds to fly to distant areas,
        impacting their access to food, places for breeding, and the overall
        ecosystem.
      - >-
        Essential tasks involved in crop cultivation consist of priming the
        soil, planting seeds, applying fertilizers and manure, providing water,
        preventing weed growth, collecting the crops, and storing them.
  - source_sentence: >-
      The embryo gets embedded in the wall of the uterus for further development
      after fertilisation.
    sentences:
      - >-
        By recycling paper, the need for harvesting trees for paper production
        can be significantly reduced, leading to conservation of trees, energy,
        and water, as well as minimizing the use of harmful chemicals in the
        paper-making process.
      - >-
        In the rainy season, if you examine moist bread, you may see greyish
        white spots that are adorned with minuscule, black circular shapes,
        believed to be microorganisms that have thrived on the bread.
      - >-
        Following fertilization, the embryo attaches to the uterine wall to
        progress in its development.

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("msamg/sts_qna_model")
# Run inference
sentences = [
    'The embryo gets embedded in the wall of the uterus for further development after fertilisation.',
    'Following fertilization, the embryo attaches to the uterine wall to progress in its development.',
    'By recycling paper, the need for harvesting trees for paper production can be significantly reduced, leading to conservation of trees, energy, and water, as well as minimizing the use of harmful chemicals in the paper-making process.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,860 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 8 tokens
    • mean: 40.09 tokens
    • max: 225 tokens
    • min: 3 tokens
    • mean: 26.95 tokens
    • max: 112 tokens
    • min: 0.0
    • mean: 0.41
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    To identify the cell membrane, cytoplasm, and nucleus under a microscope when observing cheek cells, you can look for the cell membrane as the outer boundary of the cell, the cytoplasm which is the jelly-like substance between the cell membrane and the nucleus, and the nucleus which is usually darker and located in the center of the cell. Additionally, remember that animal cells do not have a cell wall. When examining cheek cells under a microscope, you should be able to distinguish the cell membrane, which forms the outer layer, the cytoplasm, which is a gel-like material surrounding the nucleus, and the nucleus, located centrally and typically darker in appearance. It's important to note that animal cells lack a cell wall. 1.0
    The development of the embryo in oviparous animals takes place inside the egg shell. The development of the embryo in oviparous animals takes place in the mother's pouch. 0.0
    Answer not found in response. nan 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Framework Versions

  • Python: 3.11.3
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cpu
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}