---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers

---

# {MODEL_NAME}

This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

<!--- Describe your model here -->

## Usage (Sentence-Transformers)

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

```
pip install -U sentence-transformers
```

Then you can use the model like this:

```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)
```


## Usage (HuggingFace Transformers)
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.

```python
from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)
```


## Evaluation Results

<!--- Describe how your model was evaluated -->

For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})

| Model | Avg | id_raw_acc | vn_raw_acc | br_raw_acc | th_raw_acc | my_raw_acc | ph_raw_acc | sg_raw_acc |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| [thtang_ALL_679283](https://huggingface.co/thtang/ALL_679283) | 66.39 | 72.37 | 61.8 | 56.94 | 65.27 | 69.71 | 69.21 | 69.44 |
| thtang_ALL_660924 | 66.44 | 72.63 | 61.74 | 57.22 | 65.44 | 69.77 | 69.06 | 69.23 |
| [sentence-transformers_sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) | 44.35 | 50.98 | 18.38 | 36.37 | 16.91 | 59.25 | 64.82 | 63.75 |
| sentence-transformers_gtr-t5-xxl | 46.68 | 59.93 | 24.82 | 40.79 | 17.23 | 58.41 | 64.0 | 61.57 |
| sentence-transformers_LaBSE | 45.68 | 50.3 | 32.82 | 33.15 | 39.79 | 54.95 | 53.71 | 55.06 |
| sentence-transformers_all-MiniLM-L6-v2 | 41.97 | 50.8 | 25.76 | 27.04 | 15.81 | 54.63 | 60.07 | 59.68 |
| sentence-transformers_all-mpnet-base-v2 | 40.09 | 46.97 | 23.15 | 24.75 | 16.31 | 52.66 | 59.07 | 57.75 |
| sentence-transformers_all-MiniLM-L12-v2 | 41.28 | 48.98 | 24.05 | 25.74 | 16.41 | 54.51 | 60.38 | 58.9 |
| sentence-transformers_paraphrase-MiniLM-L6-v2 | 39.12 | 44.92 | 23.59 | 26.12 | 14.23 | 51.84 | 57.14 | 56.03 |
| sentence-transformers_paraphrase-mpnet-base-v2 | 39.7 | 46.0 | 20.45 | 26.92 | 14.75 | 52.89 | 58.71 | 58.2 |
| sentence-transformers_paraphrase-multilingual-MiniLM-L12-v2 | 43.72 | 44.88 | 28.32 | 29.45 | 36.4 | 53.97 | 56.87 | 56.14 |
| sentence-transformers_paraphrase-multilingual-mpnet-base-v2 | 46.12 | 49.03 | 32.58 | 32.82 | 38.43 | 55.3 | 57.36 | 57.34 |
| sentence-transformers_all-distilroberta-v1 | 39.46 | 46.74 | 22.34 | 24.06 | 17.59 | 51.49 | 57.54 | 56.45 |
| sentence-transformers_distiluse-base-multilingual-cased-v2 | 40.53 | 43.51 | 23.86 | 28.41 | 26.9 | 53.14 | 53.54 | 54.38 |
| sentence-transformers_clip-ViT-B-32-multilingual-v1 | 40.82 | 44.45 | 27.34 | 28.0 | 28.25 | 50.3 | 54.05 | 53.39 |
| intfloat_e5-large-v2 | 45.07 | 55.1 | 28.06 | 35.95 | 17.16 | 57.16 | 61.21 | 60.84 |
| intfloat_e5-small-v2 | 42.84 | 51.41 | 26.82 | 33.04 | 16.3 | 54.97 | 58.66 | 58.68 |
| intfloat_e5-large | 45.91 | 55.45 | 28.54 | 36.69 | 18.15 | 57.78 | 62.92 | 61.83 |
| intfloat_e5-small | 43.14 | 51.31 | 27.36 | 32.05 | 16.66 | 55.15 | 60.39 | 59.06 |
| intfloat_multilingual-e5-large | 49.76 | 52.99 | 42.0 | 33.92 | 47.69 | 55.82 | 57.76 | 58.16 |
| intfloat_multilingual-e5-base | 49.57 | 52.06 | 43.21 | 34.17 | 47.41 | 55.28 | 57.38 | 57.45 |
| intfloat_multilingual-e5-small | 48.35 | 49.5 | 42.68 | 30.96 | 47.42 | 54.44 | 56.44 | 57.04 |
| BAAI_bge-large-en-v1.5 | 43.56 | 49.81 | 25.55 | 30.68 | 17.41 | 56.89 | 62.87 | 61.72 |
| BAAI_bge-base-en-v1.5 | 43.42 | 51.73 | 24.3 | 31.51 | 17.53 | 56.21 | 62.37 | 60.25 |
| BAAI_bge-small-en-v1.5 | 43.07 | 51.37 | 25.16 | 29.99 | 16.13 | 56.17 | 61.69 | 61.01 |
| thenlper_gte-large | 46.31 | 55.1 | 28.16 | 33.96 | 18.73 | 59.5 | 65.19 | 63.52 |
| thenlper_gte-base | 45.3 | 55.46 | 27.88 | 32.77 | 17.2 | 58.09 | 63.68 | 62.03 |
| llmrails_ember-v1 | 43.79 | 50.85 | 24.76 | 31.02 | 17.2 | 57.62 | 63.06 | 62.04 |
| infgrad_stella-base-en-v2 | 44.23 | 52.42 | 26.24 | 30.61 | 18.81 | 56.84 | 63.03 | 61.67 |

## Training
The model was trained with the parameters:

**DataLoader**:

`torch.utils.data.dataloader.DataLoader` of length 1468721 with parameters:
```
{'batch_size': 160, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
```

**Loss**:

`sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss` 

Parameters of the fit()-Method:
```
{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "NoneType",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 100,
    "weight_decay": 0.01
}
```


## Full Model Architecture
```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)
```

## Citing & Authors

<!--- Describe where people can find more information -->