qwen3k / README.md
AlexWortega's picture
Add new SentenceTransformer model
f49662b verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1077240
  - loss:MultipleNegativesRankingLoss
base_model: Qwen/Qwen2.5-0.5B-Instruct
widget:
  - source_sentence: Who is the father of philosophy?
    sentences:
      - >-
        Charles Sanders Peirce

        Charles Sanders Peirce (/pɜːrs/[9] "purse"; 10September 1839  19April
        1914) was an American philosopher, logician, mathematician, and
        scientist who is sometimes known as "the father of pragmatism". He was
        educated as a chemist and employed as a scientist for 30 years. Today he
        is appreciated largely for his contributions to logic, mathematics,
        philosophy, scientific methodology, and semiotics, and for his founding
        of pragmatism.
      - >-
        Georg Wilhelm Friedrich Hegel

        According to Hegel, "Heraclitus is the one who first declared the nature
        of the infinite and first grasped nature as in itself infinite, that is,
        its essence as process. The origin of philosophy is to be dated from
        Heraclitus. His is the persistent Idea that is the same in all
        philosophers up to the present day, as it was the Idea of Plato and
        Aristotle". For Hegel, Heraclitus's great achievements were to have
        understood the nature of the infinite, which for Hegel includes
        understanding the inherent contradictoriness and negativity of reality;
        and to have grasped that reality is becoming or process and that "being"
        and "nothingness" are mere empty abstractions. According to Hegel,
        Heraclitus's "obscurity" comes from his being a true (in Hegel's terms
        "speculative") philosopher who grasped the ultimate philosophical truth
        and therefore expressed himself in a way that goes beyond the abstract
        and limited nature of common sense and is difficult to grasp by those
        who operate within common sense. Hegel asserted that in Heraclitus he
        had an antecedent for his logic: "[...] there is no proposition of
        Heraclitus which I have not adopted in my logic".
      - >-
        History of nuclear weapons

        The notion of using a fission weapon to ignite a process of nuclear
        fusion can be dated back to 1942. At the first major theoretical
        conference on the development of an atomic bomb hosted by J. Robert
        Oppenheimer at the University of California, Berkeley, participant
        Edward Teller directed the majority of the discussion towards Enrico
        Fermi's idea of a "Super" bomb that would use the same reactions that
        powered the Sun itself.
  - source_sentence: When was Father's Day first celebrated in America?
    sentences:
      - >-
        Father's Day (United States)

        Father's Day was founded in Spokane, Washington at the YMCA in 1910 by
        Sonora Smart Dodd, who was born in Arkansas.[4] Its first celebration
        was in the Spokane YMCA on June 19, 1910.[4][5] Her father, the Civil
        War veteran William Jackson Smart, was a single parent who raised his
        six children there.[4] After hearing a sermon about Jarvis' Mother's Day
        at Central Methodist Episcopal Church in 1909, she told her pastor that
        fathers should have a similar holiday honoring them.[4][6] Although she
        initially suggested June 5, her father's birthday, the pastors did not
        have enough time to prepare their sermons, and the celebration was
        deferred to the third Sunday of June.[7][8]
      - >-
        Father's Day

        In [[Peru]], Father's Day is celebrated on the third Sunday of June and
        is not a public holiday. People usually give a present to their fathers
        and spend time with him mostly during a family meal.
      - >-
        Sacramento River

        The Sacramento and its wide natural floodplain were once abundant in
        fish and other aquatic creatures, notably one of the southernmost large
        runs of chinook salmon in North America. For about 12,000 years, humans
        have depended on the vast natural resources of the watershed, which had
        one of the densest Native American populations in California. The river
        has provided a route for trade and travel since ancient times. Hundreds
        of tribes sharing regional customs and traditions inhabited the
        Sacramento Valley, first coming into contact with European explorers in
        the late 1700s. The Spanish explorer Gabriel Moraga named the river Rio
        de los Sacramentos in 1808, later shortened and anglicized into
        Sacramento.
  - source_sentence: What is the population of Austria in 2018?
    sentences:
      - >-
        Utah State Capitol

        The Utah State Capitol is the house of government for the U.S. state of
        Utah. The building houses the chambers and offices of the Utah State
        Legislature, the offices of the Governor, Lieutenant Governor, Attorney
        General, the State Auditor and their staffs. The capitol is the main
        building of the Utah State Capitol Complex, which is located on Capitol
        Hill, overlooking downtown Salt Lake City.
      - >-
        Same-sex marriage in Austria

        A September 2018 poll for "Österreich" found that 74% of Austrians
        supported same-sex marriage and 26% were against.
      - >-
        Demographics of Austria

        Population 8,793,370 (July 2018 est.) country comparison to the world:
        96th
  - source_sentence: What language family is Malay?
    sentences:
      - >-
        Malay language

        Malay is a member of the Austronesian family of languages, which
        includes languages from Southeast Asia and the Pacific Ocean, with a
        smaller number in continental Asia. Malagasy, a geographic outlier
        spoken in Madagascar in the Indian Ocean, is also a member of this
        language family. Although each language of the family is mutually
        unintelligible, their similarities are rather striking. Many roots have
        come virtually unchanged from their common ancestor, Proto-Austronesian
        language. There are many cognates found in the languages' words for
        kinship, health, body parts and common animals. Numbers, especially,
        show remarkable similarities.
      - >-
        Filipinos of Malay descent

        In the Philippines, there is misconception and often mixing between the
        two definitions. Filipinos consider Malays as being the natives of the
        Philippines, Indonesia, Malaysia and Brunei. Consequently, Filipinos
        consider themselves Malay when in reality, they are referring to the
        Malay Race. Filipinos in Singapore also prefer to be considered Malay,
        but their desire to be labeled as part of the ethnic group was rejected
        by the Singaporean government. Paradoxically, a minor percentage of
        Filipinos prefer the Spanish influence and may associate themselves with
        being Hispanic, and have made no realistic attempts to promote and/or
        revive the Malay language in the Philippines.
      - >-
        Preferred provider organization

        In health insurance in the United States, a preferred provider
        organization (PPO), sometimes referred to as a participating provider
        organization or preferred provider option, is a managed care
        organization of medical doctors, hospitals, and other health care
        providers who have agreed with an insurer or a third-party administrator
        to provide health care at reduced rates to the insurer's or
        administrator's clients.
  - source_sentence: When was ABC formed?
    sentences:
      - >-
        American Broadcasting Company

        ABC launched as a radio network on October 12, 1943, serving as the
        successor to the NBC Blue Network, which had been purchased by Edward J.
        Noble. It extended its operations to television in 1948, following in
        the footsteps of established broadcast networks CBS and NBC. In the
        mid-1950s, ABC merged with United Paramount Theatres, a chain of movie
        theaters that formerly operated as a subsidiary of Paramount Pictures.
        Leonard Goldenson, who had been the head of UPT, made the new television
        network profitable by helping develop and greenlight many successful
        series. In the 1980s, after purchasing an 80% interest in cable sports
        channel ESPN, the network's corporate parent, American Broadcasting
        Companies, Inc., merged with Capital Cities Communications, owner of
        several print publications, and television and radio stations. In 1996,
        most of Capital Cities/ABC's assets were purchased by The Walt Disney
        Company.
      - >-
        Roman concrete

        Roman concrete, also called opus caementicium, was a material used in
        construction during the late Roman Republic until the fading of the
        Roman Empire. Roman concrete was based on a hydraulic-setting cement.
        Recently, it has been found that it materially differs in several ways
        from modern concrete which is based on Portland cement. Roman concrete
        is durable due to its incorporation of volcanic ash, which prevents
        cracks from spreading. By the middle of the 1st century, the material
        was used frequently, often brick-faced, although variations in aggregate
        allowed different arrangements of materials. Further innovative
        developments in the material, called the Concrete Revolution,
        contributed to structurally complicated forms, such as the Pantheon
        dome, the world's largest and oldest unreinforced concrete dome.[1]
      - >-
        Americans Battling Communism

        Americans Battling Communism, Inc. (ABC) was an anti-communist
        organization created following an October 1947 speech by Pennsylvania
        Judge Blair Gunther that called for an "ABC movement" to educate America
        about communism. Chartered in November 1947 by Harry Alan Sherman, a
        local lawyer active in various anti-communist organizations, the group
        took part in such activities as blacklisting by disclosing the names of
        people suspected of being communists. Its members included local judges
        and lawyers active in the McCarthy-era prosecution of communists.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 896
          type: sts-dev-896
        metrics:
          - type: pearson_cosine
            value: 0.7512795462804751
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7602862030369626
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 768
          type: sts-dev-768
        metrics:
          - type: pearson_cosine
            value: 0.7504358517848402
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7590404004512833
            name: Spearman Cosine

SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct

This is a sentence-transformers model finetuned from Qwen/Qwen2.5-0.5B-Instruct. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen2.5-0.5B-Instruct
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 896 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AlexWortega/qwen3k")
# Run inference
sentences = [
    'When was ABC formed?',
    "American Broadcasting Company\nABC launched as a radio network on October 12, 1943, serving as the successor to the NBC Blue Network, which had been purchased by Edward J. Noble. It extended its operations to television in 1948, following in the footsteps of established broadcast networks CBS and NBC. In the mid-1950s, ABC merged with United Paramount Theatres, a chain of movie theaters that formerly operated as a subsidiary of Paramount Pictures. Leonard Goldenson, who had been the head of UPT, made the new television network profitable by helping develop and greenlight many successful series. In the 1980s, after purchasing an 80% interest in cable sports channel ESPN, the network's corporate parent, American Broadcasting Companies, Inc., merged with Capital Cities Communications, owner of several print publications, and television and radio stations. In 1996, most of Capital Cities/ABC's assets were purchased by The Walt Disney Company.",
    'Americans Battling Communism\nAmericans Battling Communism, Inc. (ABC) was an anti-communist organization created following an October 1947 speech by Pennsylvania Judge Blair Gunther that called for an "ABC movement" to educate America about communism. Chartered in November 1947 by Harry Alan Sherman, a local lawyer active in various anti-communist organizations, the group took part in such activities as blacklisting by disclosing the names of people suspected of being communists. Its members included local judges and lawyers active in the McCarthy-era prosecution of communists.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric sts-dev-896 sts-dev-768
pearson_cosine 0.7513 0.7504
spearman_cosine 0.7603 0.759

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,077,240 training samples
  • Columns: query, response, and negative
  • Approximate statistics based on the first 1000 samples:
    query response negative
    type string string string
    details
    • min: 4 tokens
    • mean: 8.76 tokens
    • max: 26 tokens
    • min: 23 tokens
    • mean: 141.88 tokens
    • max: 532 tokens
    • min: 4 tokens
    • mean: 134.02 tokens
    • max: 472 tokens
  • Samples:
    query response negative
    Was there a year 0? Year zero
    Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.
    504
    Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.
    When is the dialectical method used? Dialectic
    Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.
    Derek Bentley case
    Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.
    What do Grasshoppers eat? Grasshopper
    Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.
    Groundhog
    Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • gradient_accumulation_steps: 4
  • num_train_epochs: 1
  • warmup_ratio: 0.3
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss sts-dev-896_spearman_cosine sts-dev-768_spearman_cosine
0.0004 10 2.2049 - -
0.0009 20 2.3168 - -
0.0013 30 2.3544 - -
0.0018 40 2.2519 - -
0.0022 50 2.1809 - -
0.0027 60 2.1572 - -
0.0031 70 2.1855 - -
0.0036 80 2.5887 - -
0.0040 90 2.883 - -
0.0045 100 2.8557 - -
0.0049 110 2.9356 - -
0.0053 120 2.8833 - -
0.0058 130 2.8394 - -
0.0062 140 2.923 - -
0.0067 150 2.8191 - -
0.0071 160 2.8658 - -
0.0076 170 2.8252 - -
0.0080 180 2.8312 - -
0.0085 190 2.7761 - -
0.0089 200 2.7193 - -
0.0094 210 2.724 - -
0.0098 220 2.7484 - -
0.0102 230 2.7262 - -
0.0107 240 2.6964 - -
0.0111 250 2.6676 - -
0.0116 260 2.6715 - -
0.0120 270 2.6145 - -
0.0125 280 2.6191 - -
0.0129 290 1.9812 - -
0.0134 300 1.6413 - -
0.0138 310 1.6126 - -
0.0143 320 1.3599 - -
0.0147 330 1.2996 - -
0.0151 340 1.2654 - -
0.0156 350 1.9409 - -
0.0160 360 2.1287 - -
0.0165 370 1.8442 - -
0.0169 380 1.6837 - -
0.0174 390 1.5489 - -
0.0178 400 1.4382 - -
0.0183 410 1.4848 - -
0.0187 420 1.3481 - -
0.0192 430 1.3467 - -
0.0196 440 1.3977 - -
0.0201 450 1.26 - -
0.0205 460 1.2412 - -
0.0209 470 1.316 - -
0.0214 480 1.3501 - -
0.0218 490 1.2246 - -
0.0223 500 1.2271 - -
0.0227 510 1.1871 - -
0.0232 520 1.1685 - -
0.0236 530 1.1624 - -
0.0241 540 1.1911 - -
0.0245 550 1.1978 - -
0.0250 560 1.1228 - -
0.0254 570 1.1091 - -
0.0258 580 1.1433 - -
0.0263 590 1.0638 - -
0.0267 600 1.0515 - -
0.0272 610 1.175 - -
0.0276 620 1.0943 - -
0.0281 630 1.1226 - -
0.0285 640 0.9871 - -
0.0290 650 1.0171 - -
0.0294 660 1.0169 - -
0.0299 670 0.9643 - -
0.0303 680 0.9563 - -
0.0307 690 0.9841 - -
0.0312 700 1.0349 - -
0.0316 710 0.8958 - -
0.0321 720 0.9225 - -
0.0325 730 0.842 - -
0.0330 740 0.9104 - -
0.0334 750 0.8927 - -
0.0339 760 0.8508 - -
0.0343 770 0.8835 - -
0.0348 780 0.9531 - -
0.0352 790 0.926 - -
0.0356 800 0.8718 - -
0.0361 810 0.8261 - -
0.0365 820 0.8169 - -
0.0370 830 0.8525 - -
0.0374 840 0.8504 - -
0.0379 850 0.7625 - -
0.0383 860 0.8259 - -
0.0388 870 0.7558 - -
0.0392 880 0.7898 - -
0.0397 890 0.7694 - -
0.0401 900 0.7429 - -
0.0405 910 0.6666 - -
0.0410 920 0.7407 - -
0.0414 930 0.6665 - -
0.0419 940 0.7597 - -
0.0423 950 0.7035 - -
0.0428 960 0.7166 - -
0.0432 970 0.6889 - -
0.0437 980 0.7541 - -
0.0441 990 0.7175 - -
0.0446 1000 0.7389 0.6420 0.6403
0.0450 1010 0.7142 - -
0.0454 1020 0.7301 - -
0.0459 1030 0.7299 - -
0.0463 1040 0.6759 - -
0.0468 1050 0.7036 - -
0.0472 1060 0.6286 - -
0.0477 1070 0.595 - -
0.0481 1080 0.6099 - -
0.0486 1090 0.6377 - -
0.0490 1100 0.6309 - -
0.0495 1110 0.6306 - -
0.0499 1120 0.557 - -
0.0504 1130 0.5898 - -
0.0508 1140 0.5896 - -
0.0512 1150 0.6399 - -
0.0517 1160 0.5923 - -
0.0521 1170 0.5787 - -
0.0526 1180 0.591 - -
0.0530 1190 0.5714 - -
0.0535 1200 0.6047 - -
0.0539 1210 0.5904 - -
0.0544 1220 0.543 - -
0.0548 1230 0.6033 - -
0.0553 1240 0.5445 - -
0.0557 1250 0.5217 - -
0.0561 1260 0.5835 - -
0.0566 1270 0.5353 - -
0.0570 1280 0.5887 - -
0.0575 1290 0.5967 - -
0.0579 1300 0.5036 - -
0.0584 1310 0.5915 - -
0.0588 1320 0.5719 - -
0.0593 1330 0.5238 - -
0.0597 1340 0.5647 - -
0.0602 1350 0.538 - -
0.0606 1360 0.5457 - -
0.0610 1370 0.5169 - -
0.0615 1380 0.4967 - -
0.0619 1390 0.4864 - -
0.0624 1400 0.5133 - -
0.0628 1410 0.5587 - -
0.0633 1420 0.4691 - -
0.0637 1430 0.5186 - -
0.0642 1440 0.4907 - -
0.0646 1450 0.5281 - -
0.0651 1460 0.4741 - -
0.0655 1470 0.4452 - -
0.0659 1480 0.4771 - -
0.0664 1490 0.4289 - -
0.0668 1500 0.4551 - -
0.0673 1510 0.4558 - -
0.0677 1520 0.5159 - -
0.0682 1530 0.4296 - -
0.0686 1540 0.4548 - -
0.0691 1550 0.4439 - -
0.0695 1560 0.4295 - -
0.0700 1570 0.4466 - -
0.0704 1580 0.4717 - -
0.0708 1590 0.492 - -
0.0713 1600 0.4566 - -
0.0717 1610 0.4451 - -
0.0722 1620 0.4715 - -
0.0726 1630 0.4573 - -
0.0731 1640 0.3972 - -
0.0735 1650 0.5212 - -
0.0740 1660 0.4381 - -
0.0744 1670 0.4552 - -
0.0749 1680 0.4767 - -
0.0753 1690 0.4398 - -
0.0757 1700 0.4801 - -
0.0762 1710 0.3751 - -
0.0766 1720 0.4407 - -
0.0771 1730 0.4305 - -
0.0775 1740 0.3938 - -
0.0780 1750 0.4748 - -
0.0784 1760 0.428 - -
0.0789 1770 0.404 - -
0.0793 1780 0.4261 - -
0.0798 1790 0.359 - -
0.0802 1800 0.4422 - -
0.0807 1810 0.4748 - -
0.0811 1820 0.4352 - -
0.0815 1830 0.4032 - -
0.0820 1840 0.4124 - -
0.0824 1850 0.4486 - -
0.0829 1860 0.429 - -
0.0833 1870 0.4189 - -
0.0838 1880 0.3658 - -
0.0842 1890 0.4297 - -
0.0847 1900 0.4215 - -
0.0851 1910 0.3726 - -
0.0856 1920 0.3736 - -
0.0860 1930 0.4287 - -
0.0864 1940 0.4402 - -
0.0869 1950 0.4353 - -
0.0873 1960 0.3622 - -
0.0878 1970 0.3557 - -
0.0882 1980 0.4107 - -
0.0887 1990 0.3982 - -
0.0891 2000 0.453 0.7292 0.7261
0.0896 2010 0.3971 - -
0.0900 2020 0.4374 - -
0.0905 2030 0.4322 - -
0.0909 2040 0.3945 - -
0.0913 2050 0.356 - -
0.0918 2060 0.4182 - -
0.0922 2070 0.3694 - -
0.0927 2080 0.3989 - -
0.0931 2090 0.4237 - -
0.0936 2100 0.3961 - -
0.0940 2110 0.4264 - -
0.0945 2120 0.3609 - -
0.0949 2130 0.4154 - -
0.0954 2140 0.3661 - -
0.0958 2150 0.3328 - -
0.0962 2160 0.3456 - -
0.0967 2170 0.3478 - -
0.0971 2180 0.3339 - -
0.0976 2190 0.3833 - -
0.0980 2200 0.3238 - -
0.0985 2210 0.3871 - -
0.0989 2220 0.4009 - -
0.0994 2230 0.4115 - -
0.0998 2240 0.4024 - -
0.1003 2250 0.35 - -
0.1007 2260 0.3649 - -
0.1011 2270 0.3615 - -
0.1016 2280 0.3898 - -
0.1020 2290 0.3866 - -
0.1025 2300 0.3904 - -
0.1029 2310 0.3321 - -
0.1034 2320 0.3803 - -
0.1038 2330 0.3831 - -
0.1043 2340 0.403 - -
0.1047 2350 0.3803 - -
0.1052 2360 0.3463 - -
0.1056 2370 0.3987 - -
0.1060 2380 0.3731 - -
0.1065 2390 0.353 - -
0.1069 2400 0.3166 - -
0.1074 2410 0.3895 - -
0.1078 2420 0.4025 - -
0.1083 2430 0.3798 - -
0.1087 2440 0.2991 - -
0.1092 2450 0.3094 - -
0.1096 2460 0.3669 - -
0.1101 2470 0.3412 - -
0.1105 2480 0.3697 - -
0.1110 2490 0.369 - -
0.1114 2500 0.3393 - -
0.1118 2510 0.4232 - -
0.1123 2520 0.3445 - -
0.1127 2530 0.4165 - -
0.1132 2540 0.3721 - -
0.1136 2550 0.3476 - -
0.1141 2560 0.2847 - -
0.1145 2570 0.3609 - -
0.1150 2580 0.3017 - -
0.1154 2590 0.374 - -
0.1159 2600 0.3365 - -
0.1163 2610 0.393 - -
0.1167 2620 0.3623 - -
0.1172 2630 0.3538 - -
0.1176 2640 0.3206 - -
0.1181 2650 0.3962 - -
0.1185 2660 0.3087 - -
0.1190 2670 0.3482 - -
0.1194 2680 0.3616 - -
0.1199 2690 0.3955 - -
0.1203 2700 0.3915 - -
0.1208 2710 0.3782 - -
0.1212 2720 0.3576 - -
0.1216 2730 0.3544 - -
0.1221 2740 0.3572 - -
0.1225 2750 0.3107 - -
0.1230 2760 0.3579 - -
0.1234 2770 0.3571 - -
0.1239 2780 0.3694 - -
0.1243 2790 0.3674 - -
0.1248 2800 0.3373 - -
0.1252 2810 0.3362 - -
0.1257 2820 0.3225 - -
0.1261 2830 0.3609 - -
0.1265 2840 0.3681 - -
0.1270 2850 0.4059 - -
0.1274 2860 0.3047 - -
0.1279 2870 0.3446 - -
0.1283 2880 0.3507 - -
0.1288 2890 0.3124 - -
0.1292 2900 0.3712 - -
0.1297 2910 0.3394 - -
0.1301 2920 0.3869 - -
0.1306 2930 0.3449 - -
0.1310 2940 0.3752 - -
0.1314 2950 0.3341 - -
0.1319 2960 0.3329 - -
0.1323 2970 0.36 - -
0.1328 2980 0.3788 - -
0.1332 2990 0.3834 - -
0.1337 3000 0.3426 0.7603 0.7590

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.0
  • Transformers: 4.46.2
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}