|
--- |
|
tags: |
|
- traditional chinese |
|
- zh-tw |
|
- zh-hant |
|
- taiwan |
|
widget: |
|
- text: |- |
|
<|system|> |
|
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。 |
|
|
|
範例: |
|
Input: ```這個視頻的質量真高啊``` |
|
Output: ```這個影片的品質真高啊```</s> |
|
<|user|> |
|
Input: ```這個軟件的質量真高啊```</s> |
|
<|assistant|> |
|
Output: |
|
- text: |- |
|
<|system|> |
|
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。 |
|
|
|
範例: |
|
Input: ```這個視頻的質量真高啊``` |
|
Output: ```這個影片的品質真高啊```</s> |
|
<|user|> |
|
Input: ```我們建立了數據庫,用來儲存和管理線上服務的信息```</s> |
|
<|assistant|> |
|
Output: |
|
license: agpl-3.0 |
|
datasets: |
|
- MBZUAI/Bactrian-X |
|
language: |
|
- zh |
|
--- |
|
|
|
# Taiwan Words Translator 繁體中文台灣化翻譯器 by LLMs |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
https://github.com/SuJiaKuan/llm_tw_word |
|
|
|
The model supports translation that converts text with China words to text with only Taiwan words. Example: |
|
- Input: `這個軟件的質量真高啊` |
|
- Output: `這個軟體的品質真高啊` |
|
|
|
#### This Model |
|
|
|
This model is fine-tuned from [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (by applying Instruction Finetuning). The dataset is collected from [MBZUAI/Bactrian-X](https://huggingface.co/datasets/MBZUAI/Bactrian-X) and automatically labeled by [繁化姬](https://zhconvert.org). |
|
|
|
#### How to use |
|
You can follow the example usage below, or see [here](https://github.com/SuJiaKuan/llm_tw_word/blob/main/llm_tw_word/translate.py) to know how to integrate the model into a Python class. |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
SYSTEM_PROMPT = """\ |
|
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。 |
|
|
|
範例: |
|
Input: ```這個視頻的質量真高啊``` |
|
Output: ```這個影片的品質真高啊```\ |
|
""" |
|
|
|
text_trad = "這個軟件的質量真高啊" |
|
|
|
pipeline = pipeline( |
|
"text-generation", |
|
model="feabries/TaiwanWordTranslator-v0.1", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
|
|
prompt = "Input: ```{}```".format(text_trad) |
|
messages = [{ |
|
"role": "system", |
|
"content": SYSTEM_PROMPT, |
|
}, { |
|
"role": "user", |
|
"content": prompt, |
|
}] |
|
input_text = pipeline.tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True, |
|
) |
|
outputs = pipeline( |
|
input_text, |
|
do_sample=False, |
|
max_new_tokens=2048, |
|
) |
|
print(outputs[0]["generated_text"]) |
|
# <|system|> |
|
# 對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。 |
|
# |
|
# 範例: |
|
# Input: ```這個視頻的質量真高啊``` |
|
# Output: ```這個影片的品質真高啊```</s> |
|
# <|user|> |
|
# Input: ```這個軟件的質量真高啊```</s> |
|
# <|assistant|> |
|
# Output: ```這個軟體的品質真高啊``` |
|
``` |