Edit model card

SentenceTransformer based on dunzhang/stella_en_400M_v5

This is a sentence-transformers model finetuned from dunzhang/stella_en_400M_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dunzhang/stella_en_400M_v5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ebinan92/test")
# Run inference
sentences = [
    'Question: y=(x3)(x5) y=(x-3)(x-5)  Where does this curve intercept the x x  axis?\nCorrect Answer: beginarrayc(3,0)textand(5,0)endarray \\begin{array}{c}(3,0) \\\\ \\text { and } \\\\ (5,0)\\end{array} \nAnswer: (3,5) (3,5) ',
    'Believes both the x and y co-ordinates of the x-intercept of a quadratic are derived from the constants in the factorised form.',
    'Thinks that the square root of an expression square roots each term in the expression, rather than square rooting the whole expression',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@25 0.9983
cosine_precision@25 0.0399
cosine_recall@25 0.9983
cosine_ndcg@25 0.7903
cosine_mrr@25 0.7221
cosine_map@25 0.7221
dot_accuracy@25 0.9983
dot_precision@25 0.0399
dot_recall@25 0.9983
dot_ndcg@25 0.7748
dot_mrr@25 0.7018
dot_map@25 0.7018

Information Retrieval

Metric Value
cosine_accuracy@25 0.7818
cosine_precision@25 0.0314
cosine_recall@25 0.7818
cosine_ndcg@25 0.4855
cosine_mrr@25 0.3988
cosine_map@25 0.3983
dot_accuracy@25 0.763
dot_precision@25 0.0307
dot_recall@25 0.763
dot_ndcg@25 0.4794
dot_mrr@25 0.3961
dot_map@25 0.3956

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,110,116 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 23 tokens
    • mean: 80.04 tokens
    • max: 392 tokens
    • min: 6 tokens
    • mean: 14.85 tokens
    • max: 39 tokens
    • 1: 100.00%
  • Samples:
    sentence1 sentence2 label
    Question: Simplify the following, if possible: ( \frac{m^{2}+2 m-3}{m-3} )
    Correct Answer: Does not simplify
    Answer: ( m+1 )
    Does not know that to factorise a quadratic expression, to find two numbers that add to give the coefficient of the x term, and multiply to give the non variable term
    1
    Question: Tom and Katie are discussing the ( 5 ) plants with these heights:
    ( 24 \mathrm{cm}, 17 \mathrm{cm}, 42 \mathrm{cm}, 26 \mathrm{cm}, 13 \mathrm{cm} )
    Tom says if all the plants were cut in half, the range wouldn't change.
    Katie says if all the plants grew by ( 3 \mathrm{
    cm} ) each, the range wouldn't change.
    Who do you agree with?
    Correct Answer: Only
    Katie
    Answer: Only
    Tom
    Believes if you changed all values by the same proportion the range would not change 1
    Question: The angles highlighted on this rectangle with different length sides can never be... A rectangle with the diagonals drawn in. The angle on the right hand side at the centre is highlighted in red and the angle at the bottom at the centre is highlighted in yellow.
    Correct Answer: ( 90^{\circ} )
    Answer: acute
    Does not know the properties of a rectangle 1
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 6
  • learning_rate: 2e-05
  • num_train_epochs: 2.5
  • warmup_ratio: 0.01
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 6
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2.5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.01
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss train_cosine_map@25 val_cosine_map@25
0.0182 100 0.0002 - -
0.0364 200 0.0002 - -
0.0546 300 0.0003 - -
0.0728 400 0.0002 - -
0.0910 500 0.0002 0.2690 0.2318
0.1092 600 0.0002 - -
0.1274 700 0.0002 - -
0.1456 800 0.0002 - -
0.1638 900 0.0002 - -
0.1820 1000 0.0002 0.3119 0.2113
0.2002 1100 0.0002 - -
0.2184 1200 0.0002 - -
0.2366 1300 0.0002 - -
0.2548 1400 0.0002 - -
0.2730 1500 0.0002 0.3635 0.2468
0.2912 1600 0.0002 - -
0.3094 1700 0.0002 - -
0.3276 1800 0.0002 - -
0.3458 1900 0.0002 - -
0.3640 2000 0.0002 0.3915 0.2509
0.3822 2100 0.0002 - -
0.4004 2200 0.0002 - -
0.4185 2300 0.0002 - -
0.4367 2400 0.0002 - -
0.4549 2500 0.0002 0.4207 0.2804
0.4731 2600 0.0001 - -
0.4913 2700 0.0002 - -
0.5095 2800 0.0002 - -
0.5277 2900 0.0001 - -
0.5459 3000 0.0002 0.4695 0.3135
0.5641 3100 0.0002 - -
0.5823 3200 0.0002 - -
0.6005 3300 0.0001 - -
0.6187 3400 0.0002 - -
0.6369 3500 0.0002 0.4772 0.3196
0.6551 3600 0.0002 - -
0.6733 3700 0.0002 - -
0.6915 3800 0.0002 - -
0.7097 3900 0.0002 - -
0.7279 4000 0.0002 0.4719 0.3097
0.7461 4100 0.0002 - -
0.7643 4200 0.0002 - -
0.7825 4300 0.0002 - -
0.8007 4400 0.0002 - -
0.8189 4500 0.0001 0.5050 0.3323
0.8371 4600 0.0002 - -
0.8553 4700 0.0002 - -
0.8735 4800 0.0002 - -
0.8917 4900 0.0001 - -
0.9099 5000 0.0002 0.5020 0.3413
0.9281 5100 0.0002 - -
0.9463 5200 0.0002 - -
0.9645 5300 0.0001 - -
0.9827 5400 0.0002 - -
1.0009 5500 0.0002 0.5469 0.3522
1.0191 5600 0.0001 - -
1.0373 5700 0.0001 - -
1.0555 5800 0.0001 - -
1.0737 5900 0.0001 - -
1.0919 6000 0.0001 0.5728 0.3365
1.1101 6100 0.0001 - -
1.1283 6200 0.0001 - -
1.1465 6300 0.0001 - -
1.1647 6400 0.0001 - -
1.1829 6500 0.0001 0.5816 0.3497
1.2011 6600 0.0001 - -
1.2193 6700 0.0001 - -
1.2375 6800 0.0001 - -
1.2556 6900 0.0001 - -
1.2738 7000 0.0001 0.5986 0.3584
1.2920 7100 0.0002 - -
1.3102 7200 0.0001 - -
1.3284 7300 0.0001 - -
1.3466 7400 0.0002 - -
1.3648 7500 0.0001 0.6142 0.3709
1.3830 7600 0.0001 - -
1.4012 7700 0.0001 - -
1.4194 7800 0.0001 - -
1.4376 7900 0.0001 - -
1.4558 8000 0.0001 0.6252 0.3770
1.4740 8100 0.0001 - -
1.4922 8200 0.0001 - -
1.5104 8300 0.0001 - -
1.5286 8400 0.0001 - -
1.5468 8500 0.0001 0.6352 0.3717
1.5650 8600 0.0001 - -
1.5832 8700 0.0002 - -
1.6014 8800 0.0001 - -
1.6196 8900 0.0001 - -
1.6378 9000 0.0001 0.6471 0.3720
1.6560 9100 0.0001 - -
1.6742 9200 0.0002 - -
1.6924 9300 0.0001 - -
1.7106 9400 0.0002 - -
1.7288 9500 0.0002 0.6678 0.3820
1.7470 9600 0.0001 - -
1.7652 9700 0.0001 - -
1.7834 9800 0.0001 - -
1.8016 9900 0.0001 - -
1.8198 10000 0.0001 0.6816 0.3810
1.8380 10100 0.0001 - -
1.8562 10200 0.0001 - -
1.8744 10300 0.0001 - -
1.8926 10400 0.0001 - -
1.9108 10500 0.0001 0.6890 0.3892
1.9290 10600 0.0001 - -
1.9472 10700 0.0001 - -
1.9654 10800 0.0001 - -
1.9836 10900 0.0001 - -
2.0018 11000 0.0001 0.7008 0.3977
2.0200 11100 0.0001 - -
2.0382 11200 0.0001 - -
2.0564 11300 0.0001 - -
2.0746 11400 0.0001 - -
2.0927 11500 0.0001 0.7050 0.3902
2.1109 11600 0.0001 - -
2.1291 11700 0.0001 - -
2.1473 11800 0.0001 - -
2.1655 11900 0.0001 - -
2.1837 12000 0.0001 0.7096 0.3956
2.2019 12100 0.0001 - -
2.2201 12200 0.0001 - -
2.2383 12300 0.0001 - -
2.2565 12400 0.0001 - -
2.2747 12500 0.0001 0.7168 0.3966
2.2929 12600 0.0001 - -
2.3111 12700 0.0001 - -
2.3293 12800 0.0001 - -
2.3475 12900 0.0001 - -
2.3657 13000 0.0001 0.7191 0.3958
2.3839 13100 0.0001 - -
2.4021 13200 0.0001 - -
2.4203 13300 0.0001 - -
2.4385 13400 0.0001 - -
2.4567 13500 0.0001 0.7221 0.3983
2.4749 13600 0.0001 - -
2.4931 13700 0.0001 - -

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.1.1
  • Transformers: 4.43.1
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
16
Safetensors
Model size
434M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ebinan92/test

Finetuned
(4)
this model

Evaluation results