metadata

language:
  - en
  - ko
pipeline_tag: text-generation
inference: false
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - llama-2
  - llama-2-chat
library_name: peft

komt : korean multi task instruction tuning model

Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities. However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively. This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).

Model Details

Model Developers : davidkim(changyeon kim)
Repository : https://github.com/davidkim205/komt
Model Architecture : The komt-mistral-7b-v1-dpo is is a fine-tuned version of the komt-mistral-7b-v1(original model : Mistral-7B-Instruct-v0.1).

Dataset

maywell/ko_Ultrafeedback_binarized

https://huggingface.co/datasets/maywell/ko_Ultrafeedback_binarized

Hardware and Software

nvidia driver : 535.54.03
CUDA Version: 12.2

Training

Refer https://github.com/davidkim205/komt

Prompt template: Mistral

<s>[INST] {prompt} [/INST]</s>

Usage

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
from transformers import TextStreamer, GenerationConfig


model='davidkim205/komt-mistral-7b-v1'
peft_model_name = 'davidkim205/komt-mistral-7b-v1-dpo'
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
config.base_model_name_or_path =model
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
streamer = TextStreamer(tokenizer)

def gen(x):
    generation_config = GenerationConfig(
        temperature=0.8,
        top_p=0.8,
        top_k=100,
        max_new_tokens=1024,
        early_stopping=True,
        do_sample=True,
    )
    q = f"[INST]{x} [/INST]"
    gened = model.generate(
        **tokenizer(
            q,
            return_tensors='pt',
            return_token_type_ids=False
        ).to('cuda'),
        generation_config=generation_config,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer,
    )
    result_str = tokenizer.decode(gened[0])

    start_tag = f"[/INST]"
    start_index = result_str.find(start_tag)

    if start_index != -1:
        result_str = result_str[start_index + len(start_tag):].strip()
    return result_str

result = gen('제주도를 1박2일로 혼자 여행하려고 하는데 여행 코스를 만들어줘')

print('##########')
print(result)

output

제주도 1박2일 1인 여행 코스
제주도는 한국에서 가장 먼 섬인 동남아시아 최대 섬으로, 멋진 해변, 아름다운 자연경관, 절경 납땜 절벽, 한국 최대 규모의 복합리조트 등 다양한 관광 명소가 풍부하게 있어 1박2일로 혼자 여행하시는 여러분들을 위해 아래와 같은 코스를 제안해 드리겠습니다.

▷ 코스 1 : 성산일출봉, 용눈이절벽, 성산일출봉 야간 경관 관람
- 코스 설명 : 제주 동남쪽 해안의 명소인 성산일출봉, 용눈이절벽, 성산일출봉 야간 경관 관람 순으로 구성된 코스입니다. 아침에 일찍 일어나 일출봉에 도착하여 일출을 감상하고, 아침 식사를 하고 절벽 등반을 즐기며 휴식을 취합니다. 오후에는 일출봉 야간 경관 관람을 즐기며 휴식과 휴식을 취합니다.

▷ 코스 2 : 한라산, 한라산 케이블카, 오미자 바위, 신라 이젠
- 코스 설명 : 제주 남부의 명소인 한라산, 한라산 케이블카, 오미자 바위, 신라 이젠 순으로 구성된 코스입니다. 아침에 일찍 일어나 한라산 케이블카를 타고 높은 고지에 위치한 한라산 정상으로 올라가서 탐험을 즐기며 아침 식사를 합니다. 오후에는 오미자 바위를 찾아 휴식과 휴식을 취하고, 일출봉 야간 경관 관람을 즐기며 휴식을 취합니다.

▷ 코스 3 : 대하늘길, 삼거리, 곰돌라비, 칠동굴, 광안절, 칠금절, 해넘이길, 바다지상 길
- 코스 설명 : 제주 서부의 명소인 대하늘길, 삼거리, 곰돌라비, 칠동굴, 광안절, 칠금절, 해넘이길, 바다지상 길 순으로 구성된 코스입니다. 아침에 일찍 일어나 대하늘길에서 탐험을 즐기며 아침 식사를 합니다. 오후에는 삼거리를 찾아 휴식과 휴식을 취하고, 일출봉 야간 경관 관람을 즐기며 휴식을 취합니다.

Evaluation

For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in Self-Alignment with Instruction Backtranslation and Three Ways of Using Large Language Models to Evaluate Chat .

model	score	average(0~5)	percentage
gpt-3.5-turbo(close)	147	3.97	79.45%
naver Cue(close)	140	3.78	75.67%
clova X(close)	136	3.67	73.51%
WizardLM-13B-V1.2(open)	96	2.59	51.89%
Llama-2-7b-chat-hf(open)	67	1.81	36.21%
Llama-2-13b-chat-hf(open)	73	1.91	38.37%
nlpai-lab/kullm-polyglot-12.8b-v2(open)	70	1.89	37.83%
kfkas/Llama-2-ko-7b-Chat(open)	96	2.59	51.89%
beomi/KoAlpaca-Polyglot-12.8B(open)	100	2.70	54.05%
komt-llama2-7b-v1 (open)(ours)	117	3.16	63.24%
komt-llama2-13b-v1 (open)(ours)	129	3.48	69.72%
komt-llama-30b-v1 (open)(ours)	129	3.16	63.24%
komt-mistral-7b-v1 (open)(ours)	131	3.54	70.81%
komt-mistral-7b-v1-dpo (open)(ours)	142	3.83	76.75%