--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # {MODEL_NAME} This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: ``` pip install -U sentence-transformers ``` Then you can use the model like this: ```python from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('{MODEL_NAME}') embeddings = model.encode(sentences) print(embeddings) ``` ## Usage (HuggingFace Transformers) Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ```python from transformers import AutoTokenizer, AutoModel import torch #Mean Pooling - Take attention mask into account for correct averaging def mean_pooling(model_output, attention_mask): token_embeddings = model_output[0] #First element of model_output contains all token embeddings input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) # Sentences we want sentence embeddings for sentences = ['This is an example sentence', 'Each sentence is converted'] # Load model from HuggingFace Hub tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}') model = AutoModel.from_pretrained('{MODEL_NAME}') # Tokenize sentences encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') # Compute token embeddings with torch.no_grad(): model_output = model(**encoded_input) # Perform pooling. In this case, mean pooling. sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']) print("Sentence embeddings:") print(sentence_embeddings) ``` ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME}) | Model | Avg | id_raw_acc | vn_raw_acc | br_raw_acc | th_raw_acc | my_raw_acc | ph_raw_acc | sg_raw_acc | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | [thtang_ALL_679283](https://huggingface.co/thtang/ALL_679283) | 66.39 | 72.37 | 61.8 | 56.94 | 65.27 | 69.71 | 69.21 | 69.44 | | thtang_ALL_660924 | 66.44 | 72.63 | 61.74 | 57.22 | 65.44 | 69.77 | 69.06 | 69.23 | | [sentence-transformers_sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) | 44.35 | 50.98 | 18.38 | 36.37 | 16.91 | 59.25 | 64.82 | 63.75 | | sentence-transformers_gtr-t5-xxl | 46.68 | 59.93 | 24.82 | 40.79 | 17.23 | 58.41 | 64.0 | 61.57 | | sentence-transformers_LaBSE | 45.68 | 50.3 | 32.82 | 33.15 | 39.79 | 54.95 | 53.71 | 55.06 | | sentence-transformers_all-MiniLM-L6-v2 | 41.97 | 50.8 | 25.76 | 27.04 | 15.81 | 54.63 | 60.07 | 59.68 | | sentence-transformers_all-mpnet-base-v2 | 40.09 | 46.97 | 23.15 | 24.75 | 16.31 | 52.66 | 59.07 | 57.75 | | sentence-transformers_all-MiniLM-L12-v2 | 41.28 | 48.98 | 24.05 | 25.74 | 16.41 | 54.51 | 60.38 | 58.9 | | sentence-transformers_paraphrase-MiniLM-L6-v2 | 39.12 | 44.92 | 23.59 | 26.12 | 14.23 | 51.84 | 57.14 | 56.03 | | sentence-transformers_paraphrase-mpnet-base-v2 | 39.7 | 46.0 | 20.45 | 26.92 | 14.75 | 52.89 | 58.71 | 58.2 | | sentence-transformers_paraphrase-multilingual-MiniLM-L12-v2 | 43.72 | 44.88 | 28.32 | 29.45 | 36.4 | 53.97 | 56.87 | 56.14 | | sentence-transformers_paraphrase-multilingual-mpnet-base-v2 | 46.12 | 49.03 | 32.58 | 32.82 | 38.43 | 55.3 | 57.36 | 57.34 | | sentence-transformers_all-distilroberta-v1 | 39.46 | 46.74 | 22.34 | 24.06 | 17.59 | 51.49 | 57.54 | 56.45 | | sentence-transformers_distiluse-base-multilingual-cased-v2 | 40.53 | 43.51 | 23.86 | 28.41 | 26.9 | 53.14 | 53.54 | 54.38 | | sentence-transformers_clip-ViT-B-32-multilingual-v1 | 40.82 | 44.45 | 27.34 | 28.0 | 28.25 | 50.3 | 54.05 | 53.39 | | intfloat_e5-large-v2 | 45.07 | 55.1 | 28.06 | 35.95 | 17.16 | 57.16 | 61.21 | 60.84 | | intfloat_e5-small-v2 | 42.84 | 51.41 | 26.82 | 33.04 | 16.3 | 54.97 | 58.66 | 58.68 | | intfloat_e5-large | 45.91 | 55.45 | 28.54 | 36.69 | 18.15 | 57.78 | 62.92 | 61.83 | | intfloat_e5-small | 43.14 | 51.31 | 27.36 | 32.05 | 16.66 | 55.15 | 60.39 | 59.06 | | intfloat_multilingual-e5-large | 49.76 | 52.99 | 42.0 | 33.92 | 47.69 | 55.82 | 57.76 | 58.16 | | intfloat_multilingual-e5-base | 49.57 | 52.06 | 43.21 | 34.17 | 47.41 | 55.28 | 57.38 | 57.45 | | intfloat_multilingual-e5-small | 48.35 | 49.5 | 42.68 | 30.96 | 47.42 | 54.44 | 56.44 | 57.04 | | BAAI_bge-large-en-v1.5 | 43.56 | 49.81 | 25.55 | 30.68 | 17.41 | 56.89 | 62.87 | 61.72 | | BAAI_bge-base-en-v1.5 | 43.42 | 51.73 | 24.3 | 31.51 | 17.53 | 56.21 | 62.37 | 60.25 | | BAAI_bge-small-en-v1.5 | 43.07 | 51.37 | 25.16 | 29.99 | 16.13 | 56.17 | 61.69 | 61.01 | | thenlper_gte-large | 46.31 | 55.1 | 28.16 | 33.96 | 18.73 | 59.5 | 65.19 | 63.52 | | thenlper_gte-base | 45.3 | 55.46 | 27.88 | 32.77 | 17.2 | 58.09 | 63.68 | 62.03 | | llmrails_ember-v1 | 43.79 | 50.85 | 24.76 | 31.02 | 17.2 | 57.62 | 63.06 | 62.04 | | infgrad_stella-base-en-v2 | 44.23 | 52.42 | 26.24 | 30.61 | 18.81 | 56.84 | 63.03 | 61.67 | ## Training The model was trained with the parameters: **DataLoader**: `torch.utils.data.dataloader.DataLoader` of length 1468721 with parameters: ``` {'batch_size': 160, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'} ``` **Loss**: `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss` Parameters of the fit()-Method: ``` { "epochs": 1, "evaluation_steps": 0, "evaluator": "NoneType", "max_grad_norm": 1, "optimizer_class": "", "optimizer_params": { "lr": 2e-05 }, "scheduler": "WarmupLinear", "steps_per_epoch": null, "warmup_steps": 100, "weight_decay": 0.01 } ``` ## Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) ) ``` ## Citing & Authors