--- language: - en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - dataset_size:1K - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("philschmid/bge-base-financial-matryoshka") # Run inference sentences = [ "What was Gilead's total revenue in 2023?", 'What was the total revenue for the year ended December 31, 2023?', 'How much was the impairment related to the CAT loan receivable in 2023?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `basline_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7086 | | cosine_accuracy@3 | 0.8514 | | cosine_accuracy@5 | 0.8843 | | cosine_accuracy@10 | 0.9271 | | cosine_precision@1 | 0.7086 | | cosine_precision@3 | 0.2838 | | cosine_precision@5 | 0.1769 | | cosine_precision@10 | 0.0927 | | cosine_recall@1 | 0.7086 | | cosine_recall@3 | 0.8514 | | cosine_recall@5 | 0.8843 | | cosine_recall@10 | 0.9271 | | cosine_ndcg@10 | 0.8215 | | cosine_mrr@10 | 0.7874 | | **cosine_map@100** | **0.7907** | #### Information Retrieval * Dataset: `basline_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7114 | | cosine_accuracy@3 | 0.85 | | cosine_accuracy@5 | 0.8829 | | cosine_accuracy@10 | 0.9229 | | cosine_precision@1 | 0.7114 | | cosine_precision@3 | 0.2833 | | cosine_precision@5 | 0.1766 | | cosine_precision@10 | 0.0923 | | cosine_recall@1 | 0.7114 | | cosine_recall@3 | 0.85 | | cosine_recall@5 | 0.8829 | | cosine_recall@10 | 0.9229 | | cosine_ndcg@10 | 0.8209 | | cosine_mrr@10 | 0.7879 | | **cosine_map@100** | **0.7916** | #### Information Retrieval * Dataset: `basline_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7057 | | cosine_accuracy@3 | 0.8414 | | cosine_accuracy@5 | 0.88 | | cosine_accuracy@10 | 0.9229 | | cosine_precision@1 | 0.7057 | | cosine_precision@3 | 0.2805 | | cosine_precision@5 | 0.176 | | cosine_precision@10 | 0.0923 | | cosine_recall@1 | 0.7057 | | cosine_recall@3 | 0.8414 | | cosine_recall@5 | 0.88 | | cosine_recall@10 | 0.9229 | | cosine_ndcg@10 | 0.8162 | | cosine_mrr@10 | 0.7818 | | **cosine_map@100** | **0.7854** | #### Information Retrieval * Dataset: `basline_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7029 | | cosine_accuracy@3 | 0.8343 | | cosine_accuracy@5 | 0.8743 | | cosine_accuracy@10 | 0.9171 | | cosine_precision@1 | 0.7029 | | cosine_precision@3 | 0.2781 | | cosine_precision@5 | 0.1749 | | cosine_precision@10 | 0.0917 | | cosine_recall@1 | 0.7029 | | cosine_recall@3 | 0.8343 | | cosine_recall@5 | 0.8743 | | cosine_recall@10 | 0.9171 | | cosine_ndcg@10 | 0.8109 | | cosine_mrr@10 | 0.7769 | | **cosine_map@100** | **0.7803** | #### Information Retrieval * Dataset: `basline_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.6729 | | cosine_accuracy@3 | 0.8171 | | cosine_accuracy@5 | 0.8614 | | cosine_accuracy@10 | 0.9014 | | cosine_precision@1 | 0.6729 | | cosine_precision@3 | 0.2724 | | cosine_precision@5 | 0.1723 | | cosine_precision@10 | 0.0901 | | cosine_recall@1 | 0.6729 | | cosine_recall@3 | 0.8171 | | cosine_recall@5 | 0.8614 | | cosine_recall@10 | 0.9014 | | cosine_ndcg@10 | 0.79 | | cosine_mrr@10 | 0.754 | | **cosine_map@100** | **0.7582** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 6,300 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------| | Fiscal 2023 total gross profit margin of 35.1% represents an increase of 1.7 percentage points as compared to the respective prior year period. | What was the total gross profit margin for Hewlett Packard Enterprise in fiscal 2023? | | Noninterest expense increased to $65.8 billion in 2023, primarily due to higher investments in people and technology and higher FDIC expense, including $2.1 billion for the estimated special assessment amount arising from the closure of Silicon Valley Bank and Signature Bank. | What was the total noninterest expense for the company in 2023? | | As of May 31, 2022, FedEx Office had approximately 12,000 employees. | How many employees did FedEx Office have as of May 31, 2023? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 4 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `bf16`: True - `tf32`: True - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: True - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `sanity_evaluation`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | basline_128_cosine_map@100 | basline_256_cosine_map@100 | basline_512_cosine_map@100 | basline_64_cosine_map@100 | basline_768_cosine_map@100 | |:----------:|:------:|:-------------:|:--------------------------:|:--------------------------:|:--------------------------:|:-------------------------:|:--------------------------:| | 0.8122 | 10 | 1.5259 | - | - | - | - | - | | 0.9746 | 12 | - | 0.7502 | 0.7737 | 0.7827 | 0.7185 | 0.7806 | | 1.6244 | 20 | 0.6545 | - | - | - | - | - | | **1.9492** | **24** | **-** | **0.7689** | **0.7844** | **0.7869** | **0.7447** | **0.7909** | | 2.4365 | 30 | 0.4784 | - | - | - | - | - | | 2.9239 | 36 | - | 0.7733 | 0.7916 | 0.7904 | 0.7491 | 0.7930 | | 3.2487 | 40 | 0.3827 | - | - | - | - | - | | 3.8985 | 48 | - | 0.7739 | 0.7907 | 0.7900 | 0.7479 | 0.7948 | | 0.8122 | 10 | 0.2685 | - | - | - | - | - | | 0.9746 | 12 | - | 0.7779 | 0.7932 | 0.7948 | 0.7517 | 0.7943 | | 1.6244 | 20 | 0.183 | - | - | - | - | - | | **1.9492** | **24** | **-** | **0.7784** | **0.7929** | **0.7963** | **0.7575** | **0.7957** | | 2.4365 | 30 | 0.1877 | - | - | - | - | - | | 2.9239 | 36 | - | 0.7814 | 0.7914 | 0.7992 | 0.7570 | 0.7974 | | 3.2487 | 40 | 0.1826 | - | - | - | - | - | | 3.8985 | 48 | - | 0.7818 | 0.7916 | 0.7976 | 0.7580 | 0.7960 | | 0.8122 | 10 | 0.071 | - | - | - | - | - | | 0.9746 | 12 | - | 0.7810 | 0.7935 | 0.7954 | 0.7550 | 0.7949 | | 1.6244 | 20 | 0.0629 | - | - | - | - | - | | **1.9492** | **24** | **-** | **0.7855** | **0.7914** | **0.7989** | **0.7559** | **0.7981** | | 2.4365 | 30 | 0.0827 | - | - | - | - | - | | 2.9239 | 36 | - | 0.7893 | 0.7927 | 0.7987 | 0.7539 | 0.7961 | | 3.2487 | 40 | 0.1003 | - | - | - | - | - | | 3.8985 | 48 | - | 0.7903 | 0.7915 | 0.7980 | 0.7530 | 0.7951 | | 0.8122 | 10 | 0.0213 | - | - | - | - | - | | 0.9746 | 12 | - | 0.7786 | 0.7869 | 0.7885 | 0.7566 | 0.7908 | | 1.6244 | 20 | 0.0234 | - | - | - | - | - | | **1.9492** | **24** | **-** | **0.783** | **0.7882** | **0.793** | **0.7551** | **0.7946** | | 2.4365 | 30 | 0.0357 | - | - | - | - | - | | 2.9239 | 36 | - | 0.7838 | 0.7892 | 0.7922 | 0.7579 | 0.7907 | | 3.2487 | 40 | 0.0563 | - | - | - | - | - | | 3.8985 | 48 | - | 0.7846 | 0.7887 | 0.7912 | 0.7582 | 0.7901 | | 0.8122 | 10 | 0.0075 | - | - | - | - | - | | 0.9746 | 12 | - | 0.7730 | 0.7816 | 0.7818 | 0.7550 | 0.7868 | | 1.6244 | 20 | 0.01 | - | - | - | - | - | | **1.9492** | **24** | **-** | **0.7827** | **0.785** | **0.7896** | **0.7551** | **0.7915** | | 2.4365 | 30 | 0.0154 | - | - | - | - | - | | 2.9239 | 36 | - | 0.7808 | 0.7838 | 0.7921 | 0.7584 | 0.7916 | | 3.2487 | 40 | 0.0312 | - | - | - | - | - | | 3.8985 | 48 | - | 0.7803 | 0.7854 | 0.7916 | 0.7582 | 0.7907 | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.13 - Sentence Transformers: 3.0.0 - Transformers: 4.42.0.dev0 - PyTorch: 2.1.2+cu121 - Accelerate: 0.29.2 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```