File size: 4,259 Bytes
422d8a5 e4f1500 422d8a5 9e3f54c 422d8a5 42a0e49 422d8a5 30bb1e4 422d8a5 e4f1500 422d8a5 9e3f54c 422d8a5 30bb1e4 422d8a5 e4f1500 9e3f54c e4f1500 422d8a5 68a28e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
license: cc-by-nc-sa-4.0
language:
- krc
---
# TSjB/labse-qm
It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
Fine-tined by [Bogdan Tewunalany](https://t.me/bogdan_tewunalany)
Based on [LaBSE](https://huggingface.co/sentence-transformers/LaBSE)
<!--- Describe your model here -->
## Usage (Sentence-Transformers)
### Python:
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
```
pip install -U sentence-transformers
```
Then you can use the model like this:
```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Бу айтым юлгюдю"]
model = SentenceTransformer('TSjB/labse-qm')
embeddings = model.encode(sentences)
print(embeddings)
```
### R language:
```r
library(data.table)
library(reticulate)
library(ggplot2)
library(ggrepel)
library(Rtsne)
py_install("sentence-transformers", pip = TRUE)
st <- import("sentence_transformers")
english_sentences = base::c("dog", "Puppies are nice.", "I enjoy taking long walks along the beach with my dog.")
italian_sentences = base::c("cane", "I cuccioli sono carini.", "Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane.")
qarachay_sentences = base::c("ит", "Итле джагъымлыдыла.", "Джагъа юсю бла итим бла айланыргъа сюеме.")
model = st$SentenceTransformer('TSjB/labse-qm')
english_embeddings = model$encode(english_sentences)
italian_embeddings = model$encode(italian_sentences)
qarachay_embeddings = model$encode(qarachay_sentences)
m <- rbind(english_embeddings,
italian_embeddings,
qarachay_embeddings) %>% as.matrix
tsne <- Rtsne(m, perplexity = floor((nrow(m) - 1) / 3))
tSNE_df <- tsne$Y %>%
as.data.table() %>%
setnames(old = c("V1", "V2"), new = c("tSNE1", "tSNE2")) %>%
.[, `:=`(sentence = c(english_sentences, italian_sentences, qarachay_sentences),
language = c(rep("english", length(english_sentences)),
rep("italian", length(italian_sentences)),
rep("qarachay", length(qarachay_sentences))))]
tSNE_df %>%
ggplot(aes(x = tSNE1,
y = tSNE2,
color = language,
label = sentence
)
) +
geom_label_repel() +
geom_point()
```
## Evaluation Results
<!--- Describe how your model was evaluated -->
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
## Training
The model was trained with the parameters:
**DataLoader**:
`torch.utils.data.dataloader.DataLoader` of length 6439 with parameters:
```
{'batch_size': 8, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
```
**Loss**:
`sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters:
```
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
```
Parameters of the fit()-Method:
```
{
"epochs": 1,
"evaluation_steps": 100,
"evaluator": "__main__.ChainScoreEvaluator",
"max_grad_norm": 1,
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 2e-05
},
"scheduler": "warmupcosine",
"steps_per_epoch": null,
"warmup_steps": 1000,
"weight_decay": 0.01
}
```
## Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
(2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
(3): Normalize()
)
``` |