Fill-Mask
Transformers
PyTorch
bert
Inference Endpoints

Romanizing system of dataset

#1
by Comet0322 - opened

Hello, I am curious about which Romanization system is used for Manchu in your dataset. I use the Möllendorff system, but I found that characters like ū, š, and ž cannot be tokenized properly.

Abkai Latin transliteration was used. Please refer to our paper for more details.
https://arxiv.org/pdf/2311.17492

Thank you. I will check it out.

Sign up or log in to comment