Edit model card

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the nli-pairs, sts-label, vitaminc-pairs, qnli-contrastive, scitail-pairs-qa, scitail-pairs-pos, xsum-pairs and compression-pairs datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2")
# Run inference
sentences = [
    'All the members of one particular species in a give area are called a population.',
    'All the members of a species that live in the same area form a population.',
    'A(n) anaerobic organism does not need oxygen for growth and dies in its presence.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Datasets

nli-pairs

  • Dataset: nli-pairs at d482672
  • Size: 7,500 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 5 tokens
    • mean: 16.62 tokens
    • max: 62 tokens
    • min: 4 tokens
    • mean: 9.46 tokens
    • max: 29 tokens
  • Samples:
    sentence1 sentence2
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse.
    Children smiling and waving at camera There are children present
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

sts-label

  • Dataset: sts-label at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 9.81 tokens
    • max: 27 tokens
    • min: 5 tokens
    • mean: 9.74 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: AnglELoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_angle_sim"
    }
    

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 3,695 training samples
  • Columns: label, sentence1, and sentence2
  • Approximate statistics based on the first 1000 samples:
    label sentence1 sentence2
    type int string string
    details
    • 1: 100.00%
    • min: 6 tokens
    • mean: 16.02 tokens
    • max: 56 tokens
    • min: 8 tokens
    • mean: 38.57 tokens
    • max: 502 tokens
  • Samples:
    label sentence1 sentence2
    1 The movie Yevadu grossed more than 390 million globally . It also took the second spot in the list of the top 10 films with highest first week shares from AP.The film collected 390.5 million in 9 days , and more than 60 million from other areas , including Karnataka , the rest of India , and overseas territories , enabling it to cross the 400 million mark at the worldwide Box office , becoming Ram Charan 's fourth film to cross that mark .
    1 The film 's score is based on 33 critics . Metacritic gave the film a score of 44 out of 100 , based on 33 critics , indicating '' mixed or average reviews '' '' . ''
    1 Back to Black ( album ) sold less than 15 million copies . Worldwide , the album has sold over 12 million copies .
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

qnli-contrastive

  • Dataset: qnli-contrastive at bcdcba7
  • Size: 7,500 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 13.92 tokens
    • max: 40 tokens
    • min: 6 tokens
    • mean: 35.87 tokens
    • max: 499 tokens
    • 0: 100.00%
  • Samples:
    sentence1 sentence2 label
    Who was the biggest artist that CBS had? CBS Inc., now CBS Corporation, retained the rights to the CBS name for music recordings but granted Sony a temporary license to use the CBS name. 0
    What does a video-conference use that allows communication in live situations? This is often accomplished by the use of a multipoint control unit (a centralized distribution and call management system) or by a similar non-centralized multipoint capability embedded in each videoconferencing unit. 0
    What is the population of Saint Helena? It is part of the British Overseas Territory of Saint Helena, Ascension and Tristan da Cunha. 0
  • Loss: OnlineContrastiveLoss

scitail-pairs-qa

  • Dataset: scitail-pairs-qa at 0cc4353
  • Size: 14,987 training samples
  • Columns: sentence2 and sentence1
  • Approximate statistics based on the first 1000 samples:
    sentence2 sentence1
    type string string
    details
    • min: 7 tokens
    • mean: 15.86 tokens
    • max: 41 tokens
    • min: 7 tokens
    • mean: 15.1 tokens
    • max: 41 tokens
  • Samples:
    sentence2 sentence1
    The largest known proteins are titins. What are the largest known proteins?
    Remote-control vehicles are able to go to the deepest ocean floor. What type of vehicles is able to go to the deepest ocean floor?
    Vaccine is a preventative measure that is often delivered by injection into the arm. What preventative measure is often delivered by injection into the arm?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

scitail-pairs-pos

  • Dataset: scitail-pairs-pos at 0cc4353
  • Size: 8,600 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 7 tokens
    • mean: 23.75 tokens
    • max: 67 tokens
    • min: 7 tokens
    • mean: 15.47 tokens
    • max: 41 tokens
  • Samples:
    sentence1 sentence2
    The movement of molecules from a location where they are in a high concentration to an area where they are in a lower concentration is called diffusion . You call the movement of a substance from an area of a higher amount toward an area of lower amount diffusion.
    Climate is the average weather of an area over a long period of time. Climate is the long-term average of weather in a particular spot.
    Sunlight is captured by green plants during the process of photosynthesis to produce glucose, a carbohydrate from water and carbon dioxide. Photosynthesis converts carbon dioxide and water into glucose.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

xsum-pairs

  • Dataset: xsum-pairs at 788ddaf
  • Size: 3,750 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 28 tokens
    • mean: 355.39 tokens
    • max: 512 tokens
    • min: 8 tokens
    • mean: 27.3 tokens
    • max: 61 tokens
  • Samples:
    sentence1 sentence2
    Prices rose in all council areas and across all property types, but there were wide variations.
    In Derry City and Strabane prices were up by 11% but by less than 2% in Fermanagh and Omagh.
    The figures are from the NI Residential Property Price Index, which analyses almost all sales, including cash deals.
    The average standardised price, across all property types, is now £125,480.
    That compares to £97,428 at the bottom of the market in 2012, but is still far below the bubble-era peak of £224,670.
    Over the year the largest rise was in the apartment sector with prices up by 11%.
    For all other property types, the increase was about 5%.
    The council area with the highest average price is Lisburn and Castlereagh (£149,600) and the lowest is Derry City and Strabane (£108,464).
    The number of properties sold in 2016 was 21,669, down slightly on the 2015 figure.
    Northern Ireland experienced a huge house price bubble in the years leading up to 2007 before the market crashed.
    Prices more than halved between 2007 and early 2013 but have been increasing gradually since then.
    House prices in Northern Ireland rose by almost 6% in 2016, according to official figures.
    English and French clubs intend to break away from the Heineken Cup and create their own tournament.
    "It could well be the end of professional rugby in Scotland if the competition wasn't to go ahead," Nicol told BBC Scotland.
    "I don't think you can fill a hole of that amount with anything else."
    Let's get qualification sorted out and based on a meritocracy and then the distribution of revenues is for the boardrooms
    The Scottish Rugby Union currently receives about £5m per year for Glasgow Warriors and Edinburgh's participation in the Heineken Cup.
    European Rugby Cup (ERC), which has run the Heineken Cup since it began in 1995, wants to re-open negotiations about the tournament's future but English Premiership and French Top 14 clubs insist they will not attend talks planned by the organising body next month.
    They will quit the competition at the end of the season, citing factors such as their view that the Heineken Cup structure favours teams from the Pro12, which is made up of sides from Wales, Scotland, Ireland and Italy, and distribution of revenue.
    Nicol, who won the Heineken Cup with Bath in 1998, insists that arguments over the tournament format is a repetitive issue and he hopes "common sense" will prevail for the good of the game in Scotland.
    "It happens every few years," he told BBC Scotland. "The English and the French flex their collective muscles when the contract is coming to an end.
    "But this year, it's very different, because they've got a television deal on the table and it's a real clear and present danger.
    "I think there's an acceptance that the current format of the Heineken Cup will cease and there will be a new competition.
    Media playback is not supported on this device
    "Then we just need to ensure and hope that Scotland are heavily involved in it."
    Nicol conceded that the main stumbling block for advancing discussions was the perception that Celtic nations are favoured in the qualification process.
    At present, Ireland and Wales each have three sides guaranteed a place, while Scotland and Italy have two apiece.
    Nicol believes the English and French unions want to put a stop to automatic qualification, which could bring about the end of lucrative revenue for Glasgow and Edinburgh, although ending guaranteed entry may be necessary to ensure the future of a pan-European competition.
    The former Scotland captain said if the tournament comes to an end it would be "a sporting disaster" adding that "the Heineken Cup has been a fantastic competition".
    He added: "Where it's flawed is in the qualification. I don't think the two Scottish sides and the Italian sides or the Irish sides should qualify automatically.
    "So let's get qualification sorted out and based on a meritocracy and then the distribution of revenues is for the boardrooms.
    "There's a bit of posturing from both sides, but I just hope it's a bit of brinksmanship and they get around the table and sort something out - and we get a competition.
    "It might not be the Heineken Cup as we call it now, but hopefully we'll get something like it."
    Professional rugby union in Scotland could end if there is no European competition next season, fears former national captain Andy Nicol.
    The German was 0.203 seconds quicker than Hamilton, with Ferrari's Kimi Raikkonen third, a second off the pace.
    Mercedes set their times on the super-soft tyre, while Ferrari used the soft, which would account for about half the gap between the two cars.
    Ferrari's Sebastian Vettel was fourth, ahead of Force India's Sergio Perez.
    Hamilton enters the race nine points ahead of Rosberg in the championship after recovering from 21st on the grid to finish third at the Belgian Grand Prix last weekend, as Rosberg won.
    Ferrari have used the last of their remaining engine development 'tokens' ahead of their home race in an attempt to boost their competitiveness after a slump in form that has seen them lose second place in the constructors' championship to Red Bull.
    The fastest Red Bull was Max Verstappen in eighth, behind Haas driver Romain Grosjean and Williams' Valtteri Bottas, whose team-mate Felipe Massa announced on Thursday that he would retire at the end of the year.
    Verstappen remains the focus of attention following his controversial battle with Raikkonen in Belgium.
    Raikkonen has criticised Verstappen for being too dangerous, while the Dutchman said he would not change his driving because others were not happy.
    The stewards took no action against Verstappen in Spa, but BBC Sport has learned that Charlie Whiting, the F1 director of governing body the FIA, felt that Verstappen's late move in defence at 200mph as Raikkonen attacked was on the edge of acceptability.
    Whiting told the teams in a meeting on Thursday that he felt Verstappen could have received a black-and-white warning flag for his driving.
    The black-and-white flag is an indication of unsportsmanlike behaviour and is only shown once. If the driver commits the same offence again he can be disqualified from the race.
    Whiting's intervention raised the stakes in the debate ahead of the drivers' briefing after practice on Friday afternoon, where the incident is expected to be discussed.
    It was a relatively low-key session on track, despite a number of drivers running off the track at the tricky Monza chicanes in the warm sunshine.
    McLaren's session came to an unfortunate end as Fernando Alonso was forced to pit with a gearshift problem. He was 13th, with team-mate Jenson Button 11th, the drivers expecting their most difficult weekend of the year because of the lack of power of the Honda engine, which still lags despite recent updates.
    Button and Verstappen ran the halo head protection system in the first part of the session as trials continue ahead of the planned introduction of the device in 2018.
    Italian Grand Prix first practice results
    Italian Grand Prix coverage details
    Nico Rosberg headed team-mate Lewis Hamilton as Mercedes dominated first practice at the Italian Grand Prix.
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

compression-pairs

  • Dataset: compression-pairs at 605bc91
  • Size: 45,000 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 10 tokens
    • mean: 31.78 tokens
    • max: 170 tokens
    • min: 5 tokens
    • mean: 10.14 tokens
    • max: 29 tokens
  • Samples:
    sentence1 sentence2
    The USHL completed an expansion draft on Monday as 10 players who were on the rosters of USHL teams during the 2009-10 season were selected by the League's two newest entries, the Muskegon Lumberjacks and Dubuque Fighting Saints. USHL completes expansion draft
    NRT LLC, one of the nation's largest residential real estate brokerage companies, announced several executive appointments within its Coldwell Banker Residential Brokerage operations in Southern California. NRT announces executive appointments at its Coldwell Banker operations in Southern California
    A new survey shows 30 percent of Californians use Twitter, and more and more of us are using our smart phones to go online. Survey: 30 percent of Californians use Twitter
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Datasets

nli-pairs

  • Dataset: nli-pairs at d482672
  • Size: 2,000 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 5 tokens
    • mean: 17.64 tokens
    • max: 63 tokens
    • min: 4 tokens
    • mean: 9.67 tokens
    • max: 29 tokens
  • Samples:
    sentence1 sentence2
    Two women are embracing while holding to go packages. Two woman are holding packages.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

scitail-pairs-pos

  • Dataset: scitail-pairs-pos at 0cc4353
  • Size: 1,304 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 5 tokens
    • mean: 22.52 tokens
    • max: 67 tokens
    • min: 8 tokens
    • mean: 15.34 tokens
    • max: 36 tokens
    • 0: ~47.50%
    • 1: ~52.50%
  • Samples:
    sentence1 sentence2 label
    An introduction to atoms and elements, compounds, atomic structure and bonding, the molecule and chemical reactions. Replace another in a molecule happens to atoms during a substitution reaction. 0
    Wavelength The distance between two consecutive points on a sinusoidal wave that are in phase; Wavelength is the distance between two corresponding points of adjacent waves called. 1
    humans normally have 23 pairs of chromosomes. Humans typically have 23 pairs pairs of chromosomes. 1
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

qnli-contrastive

  • Dataset: qnli-contrastive at bcdcba7
  • Size: 2,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 14.13 tokens
    • max: 36 tokens
    • min: 4 tokens
    • mean: 36.58 tokens
    • max: 225 tokens
    • 0: 100.00%
  • Samples:
    sentence1 sentence2 label
    What came into force after the new constitution was herald? As of that day, the new constitution heralding the Second Republic came into force. 0
    What is the first major city in the stream of the Rhine? The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz. 0
    What is the minimum required if you want to teach in Canada? In most provinces a second Bachelor's Degree such as a Bachelor of Education is required to become a qualified teacher. 0
  • Loss: OnlineContrastiveLoss

sts-label

  • Dataset: sts-label at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 14.77 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 14.74 tokens
    • max: 49 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: AnglELoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_angle_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 28
  • per_device_eval_batch_size: 16
  • learning_rate: 3e-06
  • weight_decay: 1e-10
  • num_train_epochs: 5
  • max_steps: 5000
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • hub_model_id: bobox/DeBERTaV3-small-ST-checkpoints-tmp
  • hub_strategy: checkpoint
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 28
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 3e-06
  • weight_decay: 1e-10
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: 5000
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTaV3-small-ST-checkpoints-tmp
  • hub_strategy: checkpoint
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss nli-pairs loss sts-label loss scitail-pairs-pos loss qnli-contrastive loss
None 0 - 3.3906 6.4037 2.3949 2.6789
0.0723 250 3.2471 3.2669 6.3326 2.3286 2.6008
0.1445 500 3.051 3.0717 6.5578 2.0277 2.0795
0.2168 750 2.3717 2.8445 7.5564 1.5729 1.1601
0.2890 1000 1.5228 2.5520 8.3864 1.1221 0.7480
0.3613 1250 1.5747 2.1439 8.7993 0.9512 0.5071
0.4335 1500 1.2114 1.7986 9.0748 0.8195 0.3715
0.5058 1750 1.1832 1.5665 9.1778 0.6956 0.2920
0.5780 2000 0.9078 1.4173 9.3829 0.6840 0.2488
0.6503 2250 0.8436 1.3196 9.4585 0.6831 0.1584
0.7225 2500 0.8744 1.2192 9.5395 0.6232 0.1527
0.7948 2750 1.1809 1.1600 9.4297 0.5681 0.1369
0.8671 3000 0.7233 1.1149 9.4893 0.5523 0.1614
0.9393 3250 0.7862 1.0738 9.5408 0.5372 0.1291
1.0116 3500 1.0888 1.0328 9.5612 0.5286 0.1281
1.0838 3750 0.8116 1.0304 9.4794 0.5239 0.1144
1.1561 4000 1.0436 1.0215 9.4184 0.5278 0.0973
1.2283 4250 0.9298 1.0107 9.4322 0.5221 0.0970
1.3006 4500 0.682 1.0093 9.4643 0.5186 0.0951
1.3728 4750 0.9863 1.0080 9.4627 0.5176 0.0948
1.4451 5000 1.0022 1.0076 9.4645 0.5179 0.0945

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

AnglELoss

@misc{li2023angleoptimized,
    title={AnglE-optimized Text Embeddings}, 
    author={Xianming Li and Jing Li},
    year={2023},
    eprint={2309.12871},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, 
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
6

Finetuned from

Datasets used to train bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2