interstellar-ice-crystal-xs

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-xs. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This was a proof-of-method model: it was created to show the applicability of some techniques to a certain dataset. It is not, however, really an improvement on the base model, and I advise against using in production.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-xs
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset: scraped astronomy papers at the NLP for Space Science workshop.
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SimoneAstarita/interstellar-ice-crystal-xs")
# Run inference
sentences = [
    'New higher resolution images and our parametric modelling confirmed this finding.',
    'New higher resolution images and our parametric modelling confirmed this finding.',
    'Pan & Schlichting, 2012) and thus could slightly affect the surface density slope.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

The dataset is made of scrapes papers in astronomy, including abstract, introduction and conclusions. They are divided into sentences using nklt. We then duplicate them and train using the same senrence for positive and anchor. We are using SimSCE.

Unnamed Dataset

  • Size: 416,298 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 4 tokens
    • mean: 42.81 tokens
    • max: 512 tokens
    • min: 4 tokens
    • mean: 42.81 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    Resolving the inner parsec of the blazar J1924–2914 with the Event Horizon Telescope Resolving the inner parsec of the blazar J1924–2914 with the Event Horizon Telescope
    The radio source J1924–2914 (PKS 1921–293, OV–236) is a radio-loud quasar at a redshift z=0.353𝑧0.353z=0.353 (Wills & Wills, 1981; Jones et al., 2009). The radio source J1924–2914 (PKS 1921–293, OV–236) is a radio-loud quasar at a redshift z=0.353𝑧0.353z=0.353 (Wills & Wills, 1981; Jones et al., 2009).
    The source exhibits strong optical variability and is highly polarized (Wills & Wills, 1981; Pica et al., 1988; Worrall & Wilkes, 1990). The source exhibits strong optical variability and is highly polarized (Wills & Wills, 1981; Pica et al., 1988; Worrall & Wilkes, 1990).
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0077 100 0.0025
0.0154 200 0.0032
0.0231 300 0.0026
0.0307 400 0.0026
0.0384 500 0.0041
0.0461 600 0.0014
0.0538 700 0.0019
0.0615 800 0.0015
0.0692 900 0.001
0.0769 1000 0.0005
0.0846 1100 0.0004
0.0922 1200 0.0013
0.0999 1300 0.0013
0.1076 1400 0.0027
0.1153 1500 0.0018
0.1230 1600 0.001
0.1307 1700 0.0014
0.1384 1800 0.0012
0.1460 1900 0.0041
0.1537 2000 0.0009
0.1614 2100 0.0005
0.1691 2200 0.0011
0.1768 2300 0.001
0.1845 2400 0.0004
0.1922 2500 0.0011
0.1998 2600 0.0044
0.2075 2700 0.0004
0.2152 2800 0.0022
0.2229 2900 0.0007
0.2306 3000 0.0006
0.2383 3100 0.0002
0.2460 3200 0.0006
0.2537 3300 0.0004
0.2613 3400 0.0013
0.2690 3500 0.0006
0.2767 3600 0.0005
0.2844 3700 0.0018
0.2921 3800 0.0023
0.2998 3900 0.0011
0.3075 4000 0.0007
0.3151 4100 0.0008
0.3228 4200 0.0013
0.3305 4300 0.0012
0.3382 4400 0.001
0.3459 4500 0.0016
0.3536 4600 0.0025
0.3613 4700 0.0015
0.3689 4800 0.0018
0.3766 4900 0.0019
0.3843 5000 0.0021
0.3920 5100 0.0018
0.3997 5200 0.0004
0.4074 5300 0.0006
0.4151 5400 0.0007
0.4228 5500 0.0009
0.4304 5600 0.0004
0.4381 5700 0.0003
0.4458 5800 0.0007
0.4535 5900 0.0013
0.4612 6000 0.0007
0.4689 6100 0.0005
0.4766 6200 0.001
0.4842 6300 0.0027
0.4919 6400 0.0018
0.4996 6500 0.0006
0.5073 6600 0.0008
0.5150 6700 0.0006
0.5227 6800 0.0007
0.5304 6900 0.001
0.5380 7000 0.0007
0.5457 7100 0.0005
0.5534 7200 0.0012
0.5611 7300 0.0012
0.5688 7400 0.0011
0.5765 7500 0.0005
0.5842 7600 0.0013
0.5919 7700 0.0012
0.5995 7800 0.0007
0.6072 7900 0.0012
0.6149 8000 0.0012
0.6226 8100 0.0003
0.6303 8200 0.0003
0.6380 8300 0.0003
0.6457 8400 0.002
0.6533 8500 0.0003
0.6610 8600 0.0016
0.6687 8700 0.0003
0.6764 8800 0.0002
0.6841 8900 0.0006
0.6918 9000 0.0005
0.6995 9100 0.0017
0.7071 9200 0.0037
0.7148 9300 0.0005
0.7225 9400 0.0006
0.7302 9500 0.0004
0.7379 9600 0.0002
0.7456 9700 0.0008
0.7533 9800 0.0005
0.7610 9900 0.0006
0.7686 10000 0.0004
0.7763 10100 0.0004
0.7840 10200 0.0006
0.7917 10300 0.0019
0.7994 10400 0.0007
0.8071 10500 0.0003
0.8148 10600 0.0003
0.8224 10700 0.0005
0.8301 10800 0.0009
0.8378 10900 0.0006
0.8455 11000 0.002
0.8532 11100 0.0018
0.8609 11200 0.0009
0.8686 11300 0.0004
0.8762 11400 0.0005
0.8839 11500 0.0008
0.8916 11600 0.0003
0.8993 11700 0.0002
0.9070 11800 0.0004
0.9147 11900 0.0007
0.9224 12000 0.0009
0.9301 12100 0.0007
0.9377 12200 0.0007
0.9454 12300 0.0009
0.9531 12400 0.0007
0.9608 12500 0.0009
0.9685 12600 0.0004
0.9762 12700 0.0002
0.9839 12800 0.0003
0.9915 12900 0.0002
0.9992 13000 0.0002
1.0069 13100 0.0006
1.0146 13200 0.0007
1.0223 13300 0.0007
1.0300 13400 0.0005
1.0377 13500 0.0008
1.0453 13600 0.0016
1.0530 13700 0.0007
1.0607 13800 0.0013
1.0684 13900 0.0005
1.0761 14000 0.0002
1.0838 14100 0.0001
1.0915 14200 0.0003
1.0992 14300 0.0003
1.1068 14400 0.0006
1.1145 14500 0.0002
1.1222 14600 0.0003
1.1299 14700 0.0002
1.1376 14800 0.0006
1.1453 14900 0.0011
1.1530 15000 0.0004
1.1606 15100 0.0001
1.1683 15200 0.0003
1.1760 15300 0.0001
1.1837 15400 0.0002
1.1914 15500 0.0001
1.1991 15600 0.003
1.2068 15700 0.0001
1.2145 15800 0.0002
1.2221 15900 0.0005
1.2298 16000 0.0004
1.2375 16100 0.0001
1.2452 16200 0.0003
1.2529 16300 0.0003
1.2606 16400 0.0008
1.2683 16500 0.0004
1.2759 16600 0.0001
1.2836 16700 0.0002
1.2913 16800 0.0011
1.2990 16900 0.0001
1.3067 17000 0.0001
1.3144 17100 0.0002
1.3221 17200 0.0005
1.3297 17300 0.0012
1.3374 17400 0.0003
1.3451 17500 0.0002
1.3528 17600 0.0009
1.3605 17700 0.0003
1.3682 17800 0.0005
1.3759 17900 0.0008
1.3836 18000 0.0005
1.3912 18100 0.0007
1.3989 18200 0.0002
1.4066 18300 0.0003
1.4143 18400 0.0002
1.4220 18500 0.0001
1.4297 18600 0.0001
1.4374 18700 0.0001
1.4450 18800 0.0005
1.4527 18900 0.0002
1.4604 19000 0.0001
1.4681 19100 0.0002
1.4758 19200 0.0006
1.4835 19300 0.0015
1.4912 19400 0.0012
1.4988 19500 0.0003
1.5065 19600 0.0005
1.5142 19700 0.0001
1.5219 19800 0.0002
1.5296 19900 0.0009
1.5373 20000 0.0002
1.5450 20100 0.0001
1.5527 20200 0.0003
1.5603 20300 0.0006
1.5680 20400 0.0002
1.5757 20500 0.0004
1.5834 20600 0.0006
1.5911 20700 0.0004
1.5988 20800 0.0002
1.6065 20900 0.0006
1.6141 21000 0.0006
1.6218 21100 0.0001
1.6295 21200 0.0001
1.6372 21300 0.0001
1.6449 21400 0.0008
1.6526 21500 0.0001
1.6603 21600 0.0005
1.6679 21700 0.0001
1.6756 21800 0.0001
1.6833 21900 0.0001
1.6910 22000 0.0001
1.6987 22100 0.0008
1.7064 22200 0.0014
1.7141 22300 0.0002
1.7218 22400 0.0007
1.7294 22500 0.0001
1.7371 22600 0.0001
1.7448 22700 0.0001
1.7525 22800 0.0002
1.7602 22900 0.0002
1.7679 23000 0.0001
1.7756 23100 0.0001
1.7832 23200 0.0005
1.7909 23300 0.0004
1.7986 23400 0.0002
1.8063 23500 0.0001
1.8140 23600 0.0001
1.8217 23700 0.0001
1.8294 23800 0.0004
1.8370 23900 0.0002
1.8447 24000 0.0002
1.8524 24100 0.0013
1.8601 24200 0.0004
1.8678 24300 0.0002
1.8755 24400 0.0002
1.8832 24500 0.0001
1.8909 24600 0.0001
1.8985 24700 0.0001
1.9062 24800 0.0002
1.9139 24900 0.0005
1.9216 25000 0.0001
1.9293 25100 0.0001
1.9370 25200 0.0002
1.9447 25300 0.0002
1.9523 25400 0.0006
1.9600 25500 0.0004
1.9677 25600 0.0002
1.9754 25700 0.0001
1.9831 25800 0.0001
1.9908 25900 0.0001
1.9985 26000 0.0001
2.0061 26100 0.0002
2.0138 26200 0.0007
2.0215 26300 0.0003
2.0292 26400 0.0001
2.0369 26500 0.0011
2.0446 26600 0.0002
2.0523 26700 0.0001
2.0600 26800 0.0002
2.0676 26900 0.0004
2.0753 27000 0.0001
2.0830 27100 0.0001
2.0907 27200 0.0001
2.0984 27300 0.0002
2.1061 27400 0.0001
2.1138 27500 0.0001
2.1214 27600 0.0001
2.1291 27700 0.0001
2.1368 27800 0.0003
2.1445 27900 0.0012
2.1522 28000 0.0001
2.1599 28100 0.0001
2.1676 28200 0.0001
2.1752 28300 0.0001
2.1829 28400 0.0001
2.1906 28500 0.0001
2.1983 28600 0.0014
2.2060 28700 0.0001
2.2137 28800 0.0001
2.2214 28900 0.0002
2.2291 29000 0.0
2.2367 29100 0.0001
2.2444 29200 0.0001
2.2521 29300 0.0001
2.2598 29400 0.0001
2.2675 29500 0.0001
2.2752 29600 0.0001
2.2829 29700 0.0001
2.2905 29800 0.0001
2.2982 29900 0.0001
2.3059 30000 0.0001
2.3136 30100 0.0001
2.3213 30200 0.0002
2.3290 30300 0.0011
2.3367 30400 0.0001
2.3444 30500 0.0001
2.3520 30600 0.0005
2.3597 30700 0.0001
2.3674 30800 0.0001
2.3751 30900 0.0006
2.3828 31000 0.0001
2.3905 31100 0.0001
2.3982 31200 0.0002
2.4058 31300 0.0001
2.4135 31400 0.0001
2.4212 31500 0.0001
2.4289 31600 0.0001
2.4366 31700 0.0001
2.4443 31800 0.0004
2.4520 31900 0.0001
2.4596 32000 0.0001
2.4673 32100 0.0002
2.4750 32200 0.0002
2.4827 32300 0.0004
2.4904 32400 0.0008
2.4981 32500 0.0001
2.5058 32600 0.0001
2.5135 32700 0.0001
2.5211 32800 0.0001
2.5288 32900 0.0006
2.5365 33000 0.0001
2.5442 33100 0.0001
2.5519 33200 0.0002
2.5596 33300 0.0001
2.5673 33400 0.0002
2.5749 33500 0.0001
2.5826 33600 0.0001
2.5903 33700 0.0001
2.5980 33800 0.0001
2.6057 33900 0.0001
2.6134 34000 0.0007
2.6211 34100 0.0
2.6287 34200 0.0001
2.6364 34300 0.0001
2.6441 34400 0.0006
2.6518 34500 0.0001
2.6595 34600 0.0001
2.6672 34700 0.0001
2.6749 34800 0.0
2.6826 34900 0.0001
2.6902 35000 0.0001
2.6979 35100 0.0005
2.7056 35200 0.0006
2.7133 35300 0.0001
2.7210 35400 0.0005
2.7287 35500 0.0001
2.7364 35600 0.0001
2.7440 35700 0.0001
2.7517 35800 0.0001
2.7594 35900 0.0001
2.7671 36000 0.0001
2.7748 36100 0.0001
2.7825 36200 0.0005
2.7902 36300 0.0001
2.7978 36400 0.0001
2.8055 36500 0.0001
2.8132 36600 0.0001
2.8209 36700 0.0001
2.8286 36800 0.0001
2.8363 36900 0.0001
2.8440 37000 0.0001
2.8517 37100 0.0013
2.8593 37200 0.0001
2.8670 37300 0.0001
2.8747 37400 0.0001
2.8824 37500 0.0001
2.8901 37600 0.0001
2.8978 37700 0.0001
2.9055 37800 0.0001
2.9131 37900 0.0002
2.9208 38000 0.0001
2.9285 38100 0.0001
2.9362 38200 0.0001
2.9439 38300 0.0001
2.9516 38400 0.0004
2.9593 38500 0.0001
2.9669 38600 0.0001
2.9746 38700 0.0001
2.9823 38800 0.0001
2.9900 38900 0.0001
2.9977 39000 0.0001

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

#Add SimSCE reference

Downloads last month
12
Safetensors
Model size
22.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SimoneAstarita/interstellar-ice-crystal-xs

Finetuned
(6)
this model