SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct

This is a sentence-transformers model finetuned from Qwen/Qwen2.5-0.5B-Instruct. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen2.5-0.5B-Instruct
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 896 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AlexWortega/qwen1k")
# Run inference
sentences = [
    'When did the July Monarchy end?',
    'July Monarchy\nThe July Monarchy (French: Monarchie de Juillet) was a liberal constitutional monarchy in France under Louis Philippe I, starting with the July Revolution of 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration (1814–1830). It began with the overthrow of the conservative government of Charles X, the last king of the House of Bourbon.',
    'July Monarchy\nDespite the return of the House of Bourbon to power, France was much changed from the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries remained an important force and the autocracy and hierarchy of the earlier era could not be fully restored. Economic changes, which had been underway long before the revolution, had progressed further during the years of turmoil and were firmly entrenched by 1815. These changes had seen power shift from the noble landowners to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic Code and efficient bureaucracy, also remained in place. These changes produced a unified central government that was fiscally sound and had much control over all areas of French life, a sharp difference from the complicated mix of feudal and absolutist traditions and institutions of pre-Revolutionary Bourbons.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric sts-dev-896 sts-dev-768
pearson_cosine 0.4573 0.4455
spearman_cosine 0.4965 0.4897

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,859,594 training samples
  • Columns: query, response, and negative
  • Approximate statistics based on the first 1000 samples:
    query response negative
    type string string string
    details
    • min: 4 tokens
    • mean: 8.76 tokens
    • max: 26 tokens
    • min: 23 tokens
    • mean: 141.88 tokens
    • max: 532 tokens
    • min: 4 tokens
    • mean: 134.02 tokens
    • max: 472 tokens
  • Samples:
    query response negative
    Was there a year 0? Year zero
    Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.
    504
    Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.
    When is the dialectical method used? Dialectic
    Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.
    Derek Bentley case
    Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.
    What do Grasshoppers eat? Grasshopper
    Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.
    Groundhog
    Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            896,
            768
        ],
        "matryoshka_weights": [
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • gradient_accumulation_steps: 4
  • num_train_epochs: 1
  • warmup_ratio: 0.3
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss sts-dev-896_spearman_cosine sts-dev-768_spearman_cosine
0.0002 10 4.4351 - -
0.0003 20 4.6508 - -
0.0005 30 4.7455 - -
0.0007 40 4.5427 - -
0.0008 50 4.3982 - -
0.0010 60 4.3755 - -
0.0012 70 4.4105 - -
0.0013 80 5.2227 - -
0.0015 90 5.8062 - -
0.0017 100 5.7645 - -
0.0018 110 5.9261 - -
0.0020 120 5.8301 - -
0.0022 130 5.7602 - -
0.0023 140 5.9392 - -
0.0025 150 5.7523 - -
0.0027 160 5.8585 - -
0.0029 170 5.7916 - -
0.0030 180 5.8157 - -
0.0032 190 5.7102 - -
0.0034 200 5.5844 - -
0.0035 210 5.5463 - -
0.0037 220 5.5823 - -
0.0039 230 5.5514 - -
0.0040 240 5.5646 - -
0.0042 250 5.5783 - -
0.0044 260 5.5344 - -
0.0045 270 5.523 - -
0.0047 280 5.4969 - -
0.0049 290 5.5407 - -
0.0050 300 5.6171 - -
0.0052 310 5.5581 - -
0.0054 320 5.8903 - -
0.0055 330 5.8675 - -
0.0057 340 5.745 - -
0.0059 350 5.6041 - -
0.0060 360 5.5476 - -
0.0062 370 5.3964 - -
0.0064 380 5.3564 - -
0.0065 390 5.3054 - -
0.0067 400 5.2779 - -
0.0069 410 5.206 - -
0.0070 420 5.2168 - -
0.0072 430 5.1645 - -
0.0074 440 5.1797 - -
0.0076 450 5.2526 - -
0.0077 460 5.1768 - -
0.0079 470 5.3519 - -
0.0081 480 5.2982 - -
0.0082 490 5.3229 - -
0.0084 500 5.3758 - -
0.0086 510 5.2478 - -
0.0087 520 5.1799 - -
0.0089 530 5.1088 - -
0.0091 540 4.977 - -
0.0092 550 4.9108 - -
0.0094 560 4.811 - -
0.0096 570 4.7203 - -
0.0097 580 4.6499 - -
0.0099 590 4.4548 - -
0.0101 600 4.2891 - -
0.0102 610 4.1881 - -
0.0104 620 4.6 - -
0.0106 630 4.5365 - -
0.0107 640 4.3086 - -
0.0109 650 4.0452 - -
0.0111 660 3.9041 - -
0.0112 670 4.3938 - -
0.0114 680 4.3198 - -
0.0116 690 4.1294 - -
0.0117 700 4.077 - -
0.0119 710 3.9174 - -
0.0121 720 4.1629 - -
0.0123 730 3.9611 - -
0.0124 740 3.7768 - -
0.0126 750 3.5842 - -
0.0128 760 3.1196 - -
0.0129 770 3.6288 - -
0.0131 780 3.273 - -
0.0133 790 2.7889 - -
0.0134 800 2.5096 - -
0.0136 810 1.8878 - -
0.0138 820 2.3423 - -
0.0139 830 1.7687 - -
0.0141 840 2.0781 - -
0.0143 850 2.4598 - -
0.0144 860 1.7667 - -
0.0146 870 2.6247 - -
0.0148 880 1.916 - -
0.0149 890 2.0817 - -
0.0151 900 2.3679 - -
0.0153 910 1.418 - -
0.0154 920 2.7353 - -
0.0156 930 1.992 - -
0.0158 940 1.4564 - -
0.0159 950 1.4154 - -
0.0161 960 0.9499 - -
0.0163 970 1.6304 - -
0.0164 980 0.9264 - -
0.0166 990 1.3278 - -
0.0168 1000 1.686 0.4965 0.4897

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.0
  • Transformers: 4.46.2
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
17
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AlexWortega/qwen1k

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(93)
this model

Evaluation results