Romanizing system of dataset

by Comet0322 - opened Oct 24, 2024

Oct 24, 2024

Hello, I am curious about which Romanization system is used for Manchu in your dataset. I use the Möllendorff system, but I found that characters like ū, š, and ž cannot be tokenized properly.

seemdog

Owner Oct 28, 2024

Abkai Latin transliteration was used. Please refer to our paper for more details.
https://arxiv.org/pdf/2311.17492

Comet0322

Oct 31, 2024

Thank you. I will check it out.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment