Edit model card

use-cmlm-multilingual

This is a pytorch version of the universal-sentence-encoder-cmlm/multilingual-base-br model. It can be used to map 109 languages to a shared vector space. As the model is based LaBSE, it perform quite comparable on downstream tasks.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/use-cmlm-multilingual')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

Citing & Authors

Have a look at universal-sentence-encoder-cmlm/multilingual-base-br for the respective publication that describes this model.

Downloads last month
3,009
Safetensors
Model size
472M params
Tensor type
I64
ยท
F32
ยท
Inference API
This model can be loaded on Inference API (serverless).

Spaces using sentence-transformers/use-cmlm-multilingual 3