Cohere
/

multilingual-22-12

Model card Files Files and versions Community

nreimers commited on Apr 4, 2023

Commit

f19f09a

·

1 Parent(s): 0d42b43

update readme

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# Cohere `multilingual-22-12` tokenizer
+This is the tokenizer for the Cohere `multilingual-22-12` embedding model: [Cohere Multilingual Embeddings](https://docs.cohere.ai/docs/multilingual-language-models)
+You can load it with the transformers library like this:
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Cohere/multilingual-22-12")
+text = "Hellö World, this is my input string!"
+enc = tokenizer(text)
+print("Encoded input:")
+print(enc)
+inv_vocab = {v: k for k, v in tokenizer.vocab.items()}
+tokens = [inv_vocab[token_id] for token_id in enc['input_ids']]
+print("Tokens:")
+print(tokens)
+number_of_tokens = len(enc['input_ids'])
+print("Number of tokens:", number_of_tokens)
+```