--- language: [] library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:11354 - loss:BatchAllTripletLoss base_model: FacebookAI/roberta-base datasets: [] widget: - source_sentence: 'section: Uni cation-based explicit/implicit grammars, text: The grammar was modi ed so that the PCFG formed a backbone and each application of a rule involved uni cation of the residue of features in the manner speci ed in the original feature-based grammar.' sentences: - 'section: abstract, text: RNNs are good at modeling long-term dependencies over input texts, but preclude parallel computation.' - 'section: Results for Prompt Relevance, text: Table 2 (Top) shows the performance of the approaches over 11 prompts.' - 'section: Results, text: The interface between Check and ElasticSearch is filled by Alegre, an API part of the Check suite which is responsible for text and image processing, for example, similarity, classification, glossary and language identification.' - source_sentence: 'section: Languages Families Experiment, text: Finally, Japanese and Danish speakers slightly prefer the pattern NN PRP VBZ RB more than others.' sentences: - 'section: Experiments using Machine Induction, text: Position in phrase (P-P and I-P) uses numeric rather than symbolic values.' - 'section: Datasets, text: In addition, RWTH-BOSTON-50 (19) includes 483 samples of 50 different glosses, RWTH-BOSTON-104 (6) provides 200 continuous sentences encompassing 104 signs/words, and RWTH-BOSTON-400 (7), a sentence-level corpus, contains 843 sentences involving around 400 signs. ' - 'section: Convolutional Neural Networks for QA Joint Learning, text: (1) Here, H 0 is one real-value matrix after sentence semantic encoding by concatenating the word vectors with sliding windows.' - source_sentence: 'section: Comparing our Baselines and Models, text: [16] further improves on DSS for LM tasks by introducing a Gated State Space version called GSS, which performs better on PG19, arXiv and GitHub.' sentences: - 'section: Introduction, text: Current speech and natural language integration mainly relies on word-level n-best search techniques [1,2] as shown in figure 1.' - 'section: Experimental Results, text: The F M value of All system suggests that it is the most aggressive approach. ' - 'section: Introduction, text: Because inevitable relation holds at any time and the reliability of conclusions inferred from it doesn''t fall down and transitive relation can be described efficiently. ' - source_sentence: 'section: Conclusion, text: We presented a comparative evaluation of GPT-4, GPT-3.5 and Flan-PaLM 540B on medical competency examinations and benchmark datasets.' sentences: - 'section: Experiment 3: Recursive Structure, text: We surprisingly find that this non-recursive corpus induces the same amount of structural transfer as the recursive nesting parentheses, which emphasizes the importance of pairing, head-dependency type structure in the linguistic structural embeddings of LSTMs.' - 'section: Introduction, text: 2. try to analyze data by using the constructed rules and extract the exceptions that cannot be correctly handled, then return to the first step and focus on the exceptions. ' - 'section: Evaluation By Token, text: I repeated the experiment once with closed-class words and once without, and again averaged the results over the two directions of translation.' - source_sentence: 'section: Experimental Setup, text: The training data used for speech recognition -CSR -is different from the Treebank in two aspects: • the Treebank is only a subset of the usual CSR training data; • the Treebank tokenization is different from that of the CSR corpus; among other spurious small differences, the most frequent ones are of the type presented in' sentences: - 'section: Comparing with Previous Latent Semantic Models, text: 𝐹 𝑖 (or its translation candidate 𝐸), and 𝐲 be the projected feature vector, i.e., 𝐲 = 𝐖 T 𝐱.' - 'section: Out-of-domain MT, text: The improvement of DIPMT over the baseline is striking -we' - 'section: Multiple choice next sentence prediction (NSP), text: We have collected a new dataset with 54k multiple choice questions where the objective is to predict the correct continuation for a given context sentence from four possible answer choices.' pipeline_tag: sentence-similarity --- # SentenceTransformer based on FacebookAI/roberta-base This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("SBERT-roberta_pf") # Run inference sentences = [ 'section: Experimental Setup, text: The training data used for speech recognition -CSR -is different from the Treebank in two aspects: • the Treebank is only a subset of the usual CSR training data; • the Treebank tokenization is different from that of the CSR corpus; among other spurious small differences, the most frequent ones are of the type presented in', 'section: Multiple choice next sentence prediction (NSP), text: We have collected a new dataset with 54k multiple choice questions where the objective is to predict the correct continuation for a given context sentence from four possible answer choices.', 'section: Comparing with Previous Latent Semantic Models, text: 𝐹 𝑖 (or its translation candidate 𝐸), and 𝐲 be the projected feature vector, i.e., 𝐲 = 𝐖 T 𝐱.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 11,354 training samples * Columns: text and label * Approximate statistics based on the first 1000 samples: | | text | label | |:--------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------| | type | string | int | | details | | | * Samples: | text | label | |:--------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------| | section: INTRODUCTION, text: Arguments for the importance of prosody in language abound in the literature. | 0 | | section: Results, text: This overlap ensures that actions that might otherwise occur on clip boundaries will also occur as part of a clip. | 7 | | section: Introduction, text: In Section 4 the experimental setup and results are detailed. | 6 | * Loss: [BatchAllTripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss) ### Evaluation Dataset #### Unnamed Dataset * Size: 1,419 evaluation samples * Columns: text and label * Approximate statistics based on the first 1000 samples: | | text | label | |:--------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------| | type | string | int | | details | | | * Samples: | text | label | |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------| | section: Introduction, text: It is common in Natural Language Processing (NLP) that the categories into which text is classified do not have fully objective definitions. | 0 | | section: Automatic Evaluation Results, text: With respect to the BLEU score, this difference is 1.58 points absolute for the word based evaluation (27% relative increase), and 2.47 points absolute for the morphemebased evaluation (21% relative increase). | 2 | | section: Neural Descriptor Fields, text: The sum of the maxpooled mask probabilities of all slots can be used for counting, and the loss can be back propagated to optimize NDF as well as the embeddings. | 7 | * Loss: [BatchAllTripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss) ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `learning_rate`: 1e-05 - `weight_decay`: 0.1 - `load_best_model_at_end`: True - `push_to_hub`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 8 - `per_device_eval_batch_size`: 8 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `learning_rate`: 1e-05 - `weight_decay`: 0.1 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 3 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: True - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | |:----------:|:--------:|:-------------:|:----------:| | 0.3521 | 500 | 4.3466 | 4.0196 | | 0.7042 | 1000 | 3.9809 | 3.3573 | | 1.0563 | 1500 | 3.8231 | 3.7082 | | 1.4085 | 2000 | 3.5722 | 3.6799 | | 1.7606 | 2500 | 3.6224 | 3.4086 | | 2.1127 | 3000 | 3.1266 | 3.2109 | | 2.4648 | 3500 | 3.1252 | 3.3558 | | **2.8169** | **4000** | **3.1115** | **3.1682** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.9.2 - Sentence Transformers: 3.0.1 - Transformers: 4.41.2 - PyTorch: 2.3.1+cu121 - Accelerate: 0.31.0 - Datasets: 2.19.2 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### BatchAllTripletLoss ```bibtex @misc{hermans2017defense, title={In Defense of the Triplet Loss for Person Re-Identification}, author={Alexander Hermans and Lucas Beyer and Bastian Leibe}, year={2017}, eprint={1703.07737}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```