feabries
/

TaiwanWordTranslator-v0.1

Text Generation

traditional chinese

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TaiwanWordTranslator-v0.1 / README.md

feabries's picture

Update README.md

031fd29 verified 7 months ago

|

history blame contribute delete

No virus

3.16 kB

	---
	tags:
	- traditional chinese
	- zh-tw
	- zh-hant
	- taiwan
	widget:
	- text: \|-
	<\|system\|>
	對於輸入內容的中文文字，請將中國用語轉成台灣的用語，其他非中文文字或非中國用語都維持不變。

	範例：
	Input: ```這個視頻的質量真高啊```
	Output: ```這個影片的品質真高啊```</s>
	<\|user\|>
	Input: ```這個軟件的質量真高啊```</s>
	<\|assistant\|>
	Output:
	- text: \|-
	<\|system\|>
	對於輸入內容的中文文字，請將中國用語轉成台灣的用語，其他非中文文字或非中國用語都維持不變。

	範例：
	Input: ```這個視頻的質量真高啊```
	Output: ```這個影片的品質真高啊```</s>
	<\|user\|>
	Input: ```我們建立了數據庫，用來儲存和管理線上服務的信息```</s>
	<\|assistant\|>
	Output:
	license: agpl-3.0
	datasets:
	- MBZUAI/Bactrian-X
	language:
	- zh
	---

	# Taiwan Words Translator 繁體中文台灣化翻譯器 by LLMs

	<!-- Provide a quick summary of what the model is/does. -->

	https://github.com/SuJiaKuan/llm_tw_word

	The model supports translation that converts text with China words to text with only Taiwan words. Example:
	- Input: `這個軟件的質量真高啊`
	- Output: `這個軟體的品質真高啊`

	#### This Model

	This model is fine-tuned from [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) (by applying Instruction Finetuning). The dataset is collected from [MBZUAI/Bactrian-X](https://huggingface.co/datasets/MBZUAI/Bactrian-X) and automatically labeled by [繁化姬](https://zhconvert.org).

	#### How to use
	You can follow the example usage below, or see [here](https://github.com/SuJiaKuan/llm_tw_word/blob/main/llm_tw_word/translate.py) to know how to integrate the model into a Python class.

	```python
	import torch
	from transformers import pipeline

	SYSTEM_PROMPT = """\
	對於輸入內容的中文文字，請將中國用語轉成台灣的用語，其他非中文文字或非中國用語都維持不變。

	範例：
	Input: ```這個視頻的質量真高啊```
	Output: ```這個影片的品質真高啊```\
	"""

	text_trad = "這個軟件的質量真高啊"

	pipeline = pipeline(
	"text-generation",
	model="feabries/TaiwanWordTranslator-v0.1",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	prompt = "Input: ```{}```".format(text_trad)
	messages = [{
	"role": "system",
	"content": SYSTEM_PROMPT,
	}, {
	"role": "user",
	"content": prompt,
	}]
	input_text = pipeline.tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	outputs = pipeline(
	input_text,
	do_sample=False,
	max_new_tokens=2048,
	)
	print(outputs[0]["generated_text"])
	# <\|system\|>
	# 對於輸入內容的中文文字，請將中國用語轉成台灣的用語，其他非中文文字或非中國用語都維持不變。
	#
	# 範例：
	# Input: ```這個視頻的質量真高啊```
	# Output: ```這個影片的品質真高啊```</s>
	# <\|user\|>
	# Input: ```這個軟件的質量真高啊```</s>
	# <\|assistant\|>
	# Output: ```這個軟體的品質真高啊```
	```