This tokenizer was build using "bengisucam/tr_dataset_combined" dataset. The vocab size is 50000. 6e417ac bengisucam commited on Dec 1, 2023