84c361f 492f8a3 3315142
1
2
3
4
5
6
7
# summary multilingual tokenizer trained on multilingual data by using the SentencePiece library and the BPE algorithm. * vocab size: 100k