Japanese Dummy Tokenizer
Repository containing a dummy Japanese Tokenizer trained on snow_simplified_japanese_corpus
dataset. The tokenizer has been trained using Hugging Face datasets in a streaming manner.
Intended uses & limitations
You can use this tokenizer to tokenize Japanese sentences.
How to use it
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("ybelkada/japanese-dummy-tokenizer")
How to train the tokenizer
Check the file tokenizer.py
, you can freely adapt it to other datasets. This tokenizer is based on the tokenizer from csebuetnlp/mT5_multilingual_XLSum
.
- Downloads last month
- 1
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.