--- language: [] library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - dataset_size:10K - **Maximum Sequence Length:** 384 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("JuanIgnacioSolerno/all-mpnet-base-v2-sts") # Run inference sentences = [ 'AP Analyst', 'AP Specialist', 'ESCO Service Coordinator', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 11,923 training samples * Columns: sentence1, sentence2, and score * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | score | |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details |

min: 3 tokens
mean: 7.17 tokens
max: 27 tokens

min: 4 tokens
mean: 4.0 tokens
max: 4 tokens

min: 0.0
mean: 0.04
max: 1.0

| * Samples: | sentence1 | sentence2 | score | |:-----------------------------------------------------------------------------|:----------------------------|:-----------------| | Land Coordinator, Renewable Development | Energy Analyst | 0.0 | | Customer Service Advocate - Remote within the state of Colorado | Energy Analyst | 0.0 | | Global Head of Infrastructure | Energy Analyst | 0.0 | * Loss: [CosineSimilarityLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 2,981 evaluation samples * Columns: sentence1, sentence2, and score * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | score | |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details |

min: 3 tokens
mean: 7.21 tokens
max: 28 tokens

min: 4 tokens
mean: 4.0 tokens
max: 4 tokens

min: 0.0
mean: 0.05
max: 1.0

| * Samples: | sentence1 | sentence2 | score | |:---------------------------------------------------------------------|:----------------------------|:-----------------| | IT Data Coordinator - Customer Data & Integrations Team | Energy Analyst | 0.0 | | Warehouse Associate | Energy Analyst | 0.0 | | Human Resources Manager | Energy Analyst | 0.0 | * Loss: [CosineSimilarityLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.0 - Transformers: 4.41.2 - PyTorch: 2.0.0.post200 - Accelerate: 0.30.1 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```