README.md · buddhist-nlp/mbart-buddhist-chinese-to-eng at d8fe2c21d4c5c77a94fb9d9138b35870fd08e041

metadata

language:
  - zh
  - en
tags:
  - translation
widget:
  - text: 如是我闻：一时，佛在舍卫国只树花林窟，与大比丘众千二百五十人俱。
inference: false

This model is based on MBART and translates Buddhist Chinese to English. It is optimized for a sequence length of 300 (Buddhist Chinese input sequences shouldn't exceed 150 characters). This model uses "#" with a space before and after as delimiter between sentences (in addition to the normal Chinese punctuation). Input should be converted to simplified Chinese before running. The model also doesn't like short sequences very much, for best results supply input sequences between 100 and 150 characters in length. The model shows good performance on Sūtra texts and does perform not too bad on Abhidharma and Yogācāra. However, it does have the usual problems that NMT systems have with named entities (names of persons and places). Also it shows a tendency to hallucinate at times, i.e. generating a translation that has no direct relationship with the input.