zirui3
/

llm-multilingual-tokenizer

File size: 143 Bytes



# summary
multilingual tokenizer trained on multilingual data by using the SentencePiece library and the BPE algorithm. 

* vocab size: 100k