feabries's picture
Update README.md
031fd29 verified
---
tags:
- traditional chinese
- zh-tw
- zh-hant
- taiwan
widget:
- text: |-
<|system|>
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。
範例:
Input: ```這個視頻的質量真高啊```
Output: ```這個影片的品質真高啊```</s>
<|user|>
Input: ```這個軟件的質量真高啊```</s>
<|assistant|>
Output:
- text: |-
<|system|>
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。
範例:
Input: ```這個視頻的質量真高啊```
Output: ```這個影片的品質真高啊```</s>
<|user|>
Input: ```我們建立了數據庫,用來儲存和管理線上服務的信息```</s>
<|assistant|>
Output:
license: agpl-3.0
datasets:
- MBZUAI/Bactrian-X
language:
- zh
---
# Taiwan Words Translator 繁體中文台灣化翻譯器 by LLMs
<!-- Provide a quick summary of what the model is/does. -->
https://github.com/SuJiaKuan/llm_tw_word
The model supports translation that converts text with China words to text with only Taiwan words. Example:
- Input: `這個軟件的質量真高啊`
- Output: `這個軟體的品質真高啊`
#### This Model
This model is fine-tuned from [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (by applying Instruction Finetuning). The dataset is collected from [MBZUAI/Bactrian-X](https://huggingface.co/datasets/MBZUAI/Bactrian-X) and automatically labeled by [繁化姬](https://zhconvert.org).
#### How to use
You can follow the example usage below, or see [here](https://github.com/SuJiaKuan/llm_tw_word/blob/main/llm_tw_word/translate.py) to know how to integrate the model into a Python class.
```python
import torch
from transformers import pipeline
SYSTEM_PROMPT = """\
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。
範例:
Input: ```這個視頻的質量真高啊```
Output: ```這個影片的品質真高啊```\
"""
text_trad = "這個軟件的質量真高啊"
pipeline = pipeline(
"text-generation",
model="feabries/TaiwanWordTranslator-v0.1",
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "Input: ```{}```".format(text_trad)
messages = [{
"role": "system",
"content": SYSTEM_PROMPT,
}, {
"role": "user",
"content": prompt,
}]
input_text = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
outputs = pipeline(
input_text,
do_sample=False,
max_new_tokens=2048,
)
print(outputs[0]["generated_text"])
# <|system|>
# 對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。
#
# 範例:
# Input: ```這個視頻的質量真高啊```
# Output: ```這個影片的品質真高啊```</s>
# <|user|>
# Input: ```這個軟件的質量真高啊```</s>
# <|assistant|>
# Output: ```這個軟體的品質真高啊```
```