jaswanthrk's picture
Update README.md
0ddb6ec verified
metadata
license: mit

posterior_KaTeMaTa_llama_llama.model

  • This is SP format tokenizer obtained by merging Kannada, Telugu, Malayalam, Tamil and Llama-2 tokenizers.

posterior_dr_llama_15_32k_balanced.model posterior_dr_llama_15_32k_balanced.vocab

  • These is SP format tokenizer obtained by training the SP tokenizer using the four languages data.