nayohan
/

llama3-8b-it-translation-general-en-ko-1sent

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama3-8b-it-translation-general-en-ko-1sent / README.md

nayohan's picture

Update README.md

6c0a254 verified 8 months ago

|

2.98 kB

	---
	language:
	- en
	- ko
	license: llama3
	library_name: transformers
	tags:
	- translation
	- enko
	- ko
	base_model:
	- meta-llama/Meta-Llama-3-8B-Instruct
	datasets:
	- nayohan/aihub-en-ko-translation-1.2m
	pipeline_tag: text-generation
	---

	# Introduction
	The model was trained to translate a single sentence from English to Korean with a 1.18M dataset in the general domain.
	Dataset: [nayohan/aihub-en-ko-translation-1.2m](https://huggingface.co/datasets/nayohan/aihub-en-ko-translation-1.2m)

	### Loading the Model

	Use the following Python code to load the model:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "nayohan/llama3-8b-it-translation-general-en-ko-1sent"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype=torch.bfloat16
	)
	```

	### Generating Text
	To generate text, use the following Python code: Currently, this model only support English to Korean, not other languages or reverse and styles.
	```python
	style="written"
	SYSTEM_PROMPT=f"Acts as a translator. Translate en sentences into ko sentences in {style} style."

	s = "The aerospace industry is a flower in the field of technology and science."
	conversation = [{'role': 'system', 'content': SYSTEM_PROMPT},
	{'role': 'user', 'content': s}]

	inputs = tokenizer.apply_chat_template(
	conversation,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors='pt'
	).to("cuda")

	outputs = model.generate(inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0][len(inputs[0]):]))
	```
	```
	# Result
	# INPUT: <\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>\n\nActs as a translator. Translate en sentences into ko sentences in colloquial style.<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>\n\nThe aerospace industry is a flower in the field of technology and science.<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n
	# OUTPUT: 항공 우주 산업은 기술과 과학의 꽃입니다.<\|eot_id\|>

	# INPUT: <\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>\n\nActs as a translator. Translate en sentences into ko sentences in colloquial style.<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>\n\n
	Technical and basic sciences are very important in terms of research. It has a significant impact on the industrial development of a country. Government policies control the research budget.<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n
	# OUTPUT: 기술과 기초과학은 연구 측면에서 매우 중요합니다. 한 국가의 산업 발전에 큰 영향을 미칩니다. 정부 정책은 연구 예산을 통제합니다.<\|eot_id\|>
	```

	### Citation
	```bibtex
	@article{llama3modelcard,
	title={Llama 3 Model Card},
	author={AI@Meta},
	year={2024},
	url={https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
	}
	```
	Our trainig code can be found here: [TBD]