nreimers commited on
Commit
f19f09a
·
1 Parent(s): 0d42b43

update readme

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -1,3 +1,26 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # Cohere `multilingual-22-12` tokenizer
6
+
7
+ This is the tokenizer for the Cohere `multilingual-22-12` embedding model: [Cohere Multilingual Embeddings](https://docs.cohere.ai/docs/multilingual-language-models)
8
+
9
+ You can load it with the transformers library like this:
10
+ ```python
11
+ from transformers import AutoTokenizer
12
+
13
+ tokenizer = AutoTokenizer.from_pretrained("Cohere/multilingual-22-12")
14
+ text = "Hellö World, this is my input string!"
15
+ enc = tokenizer(text)
16
+ print("Encoded input:")
17
+ print(enc)
18
+
19
+ inv_vocab = {v: k for k, v in tokenizer.vocab.items()}
20
+ tokens = [inv_vocab[token_id] for token_id in enc['input_ids']]
21
+ print("Tokens:")
22
+ print(tokens)
23
+
24
+ number_of_tokens = len(enc['input_ids'])
25
+ print("Number of tokens:", number_of_tokens)
26
+ ```