File size: 7,153 Bytes
0a95e86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
language:
- en
- ko
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- llama-2-chat
library_name: peft
---
# komt : korean multi task instruction tuning model
![multi task instruction tuning.jpg](https://github.com/davidkim205/komt/assets/16680469/c7f6ade7-247e-4b62-a94f-47e19abea68e)

Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities. 
However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively. 
This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).

## Model Details

* **Model Developers** :  davidkim(changyeon kim)
* **Repository** : https://github.com/davidkim205/komt
* **Model Architecture** : The komt-mistral-7b-v1-dpo is is a fine-tuned version of the komt-mistral-7b-v1(original model : Mistral-7B-Instruct-v0.1).

  
## Dataset
* maywell/ko_Ultrafeedback_binarized
https://huggingface.co/datasets/maywell/ko_Ultrafeedback_binarized

## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2

## Training
Refer https://github.com/davidkim205/komt

## Prompt template: Mistral
```
<s>[INST] {prompt} [/INST]</s>
```

## Usage
``` 
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
from transformers import TextStreamer, GenerationConfig


model='davidkim205/komt-mistral-7b-v1'
peft_model_name = 'davidkim205/komt-mistral-7b-v1-dpo'
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
config.base_model_name_or_path =model
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
streamer = TextStreamer(tokenizer)

def gen(x):
    generation_config = GenerationConfig(
        temperature=0.8,
        top_p=0.8,
        top_k=100,
        max_new_tokens=1024,
        early_stopping=True,
        do_sample=True,
    )
    q = f"[INST]{x} [/INST]"
    gened = model.generate(
        **tokenizer(
            q,
            return_tensors='pt',
            return_token_type_ids=False
        ).to('cuda'),
        generation_config=generation_config,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        streamer=streamer,
    )
    result_str = tokenizer.decode(gened[0])

    start_tag = f"[/INST]"
    start_index = result_str.find(start_tag)

    if start_index != -1:
        result_str = result_str[start_index + len(start_tag):].strip()
    return result_str

result = gen('μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜')

print('##########')
print(result)
```
output
``` 
<s> [INST]μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜ [/INST]μ œμ£Όλ„μ˜ μ•„λ¦„λ‹€μš΄ μžμ—° κ²½κ΄€κ³Ό λ¬Έν™”μœ μ‚°μ„ 느끼고 μ‹Άλ‹€λ©΄ λ‹€μŒκ³Ό 같은 μ½”μŠ€λ₯Ό μΆ”μ²œν•œλ‹€.

첫 번째 μ½”μŠ€λŠ” μ•„λ¦„λ‹€μš΄ ν•΄μ•ˆ κ²½κ΄€κ³Ό ν•΄μ•ˆμ‚¬κ΅¬λ₯Ό κ°μƒν•˜κΈ° μœ„ν•΄ 제주 μ„œλΆ€ ν•΄μ•ˆμœΌλ‘œ μ΄λ™ν•˜λŠ” 것이닀. 제주 μ‹œλ‚΄μ—μ„œ μ™Όμͺ½ λ°©ν–₯으둜 νƒλ‚˜λ©΄ ν•œλ¦Όν•΄μˆ˜μš•μž₯, μ„±μ‚°ν•΄μˆ˜μš•μž₯, λ΄‰λ™ν•΄μˆ˜μš•μž₯ λ“± 유λͺ…ν•œ ν•΄μˆ˜μš•μž₯을 κ²½μœ ν•  수 μžˆλ‹€. 이 지역은 맑은 바닀와 넓은 ν•΄μ•ˆμ—μ„œ ν•΄μˆ˜μš•μ„ 즐길 수 있으며, ν•΄μˆ˜μš•μž₯ μ£Όλ³€μ—λŠ” λ§Žμ€ μŒμ‹μ μ΄ μžˆμ–΄ 배식을 즐길 수 μžˆλ‹€. μ„œμͺ½ ν•΄μ•ˆμœΌλ‘œ μ΄λ™ν•˜λŠ” λ™μ•ˆ 제주 λŒ€ν‘œ μ‚¬κ³„μ ˆ 맛집인 ν—ˆλΈŒ μˆ˜ν”„ 및 μ†ŒλΌλΉ„ λ“± λ§›μžˆλŠ” μŒμ‹μ„ 맛볼 수 μžˆλ‹€. μ„œλΆ€ ν•΄μ•ˆμ„ λŒμ•„ λ‹€μ‹œ 제주 μ‹œλ‚΄λ‘œ λŒμ•„μ˜€λŠ” λ™μ•ˆ 제주 νŠΉμ‚°ν’ˆ μ‹œμž₯μ—μ„œ 제주 νŠΉμ‚°ν’ˆμ„ μ‚΄ 수 μžˆλ‹€.

두 번째 μ½”μŠ€λŠ” 동뢀 ν•΄μ•ˆμ„ λŒμ•„λ³΄λŠ” 것이닀. 제주 μ‹œλ‚΄μ—μ„œ 였λ₯Έμͺ½ λ°©ν–₯으둜 νƒλ‚˜λ©΄ μ•„μ΄μŠ€ν¬λ¦Ό 거리인 ν•œλ¦Όν•΄μˆ˜μš•μž₯, μ„±μ‚°ν•΄μˆ˜μš•μž₯, λ΄‰λ™ν•΄μˆ˜μš•μž₯ λ“± λ‹€μ‹œ ν•œ 번 유λͺ…ν•œ ν•΄μˆ˜μš•μž₯을 κ²½μœ ν•  수 μžˆλ‹€. 이 지역은 ν•΄μˆ˜μš•μž₯ μ£Όλ³€μ—λŠ” λ§Žμ€ μŒμ‹μ μ΄ μžˆμ–΄ 배식을 즐길 수 μžˆλ‹€. 동뢀 ν•΄μ•ˆμ„ λŒμ•„ λ‹€μ‹œ 제주 μ‹œλ‚΄λ‘œ λŒμ•„μ˜€λŠ” λ™μ•ˆ 제주 νŠΉμ‚°ν’ˆ μ‹œμž₯μ—μ„œ 제주 νŠΉμ‚°ν’ˆμ„ μ‚΄ 수 μžˆλ‹€. 이 μ§€μ—­μ—λŠ” λ§Žμ€ μŒμ‹μ μ΄ μžˆμ–΄ λ§›μžˆλŠ” μŒμ‹μ„ 맛볼 수 μžˆλ‹€.

μ„Έ 번째 μ½”μŠ€λŠ” 제주 λ‚¨λΆ€λ‘œ μ΄λ™ν•˜λŠ” 것이닀. 제주 μ‹œλ‚΄μ—μ„œ 였λ₯Έμͺ½ λ°©ν–₯으둜 νƒλ‚˜λ©΄ 제주 λ‚¨λΆ€λ‘œ 이동할 수 μžˆλ‹€. 이 지역은 ν•œλΌμ‚° ꡭ립곡원이 μœ„μΉ˜ν•΄ μžˆμ–΄ μžμ—° 경관을 감상할 수 μžˆλ‹€. ν•œλΌμ‚° ꡭ립곡원 λ‚΄μ—λŠ” λ‹€μ–‘ν•œ μžμ—° κ²½κ΄€κ³Ό μ‚°μ•… 경둜λ₯Ό 즐길 수 μžˆλŠ” 탐방 μ½”μŠ€κ°€ μžˆλ‹€. λ˜ν•œ, 제주 λ‚¨λΆ€λŠ” λ§Žμ€ ν•΄μˆ˜μš•μž₯κ³Ό 골프μž₯이 μœ„μΉ˜ν•΄ μžˆμ–΄ ν•΄μˆ˜μš•κ³Ό 골프λ₯Ό 즐길 수 μžˆλ‹€. λ‚¨λΆ€λ‘œ μ΄λ™ν•˜λŠ” λ™μ•ˆ 제주 νŠΉμ‚°ν’ˆ μ‹œμž₯μ—μ„œ 제주 νŠΉμ‚°ν’ˆμ„ μ‚΄ 수 μžˆλ‹€.


```
## Evaluation
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .


| model                                    | score   | average(0~5) | percentage |
|------------------------------------------|---------| ------------ |------------|
| gpt-3.5-turbo(close)                     | 147     | 3.97         | 79.45%     |
| naver Cue(close)                         | 140     | 3.78         | 75.67%     |
| clova X(close)                           | 136     | 3.67         | 73.51%     |
| WizardLM-13B-V1.2(open)                  | 96      | 2.59         | 51.89%     |
| Llama-2-7b-chat-hf(open)                 | 67      | 1.81         | 36.21%     |
| Llama-2-13b-chat-hf(open)                | 73      | 1.91         | 38.37%     |
| nlpai-lab/kullm-polyglot-12.8b-v2(open)  | 70      | 1.89         | 37.83%     |
| kfkas/Llama-2-ko-7b-Chat(open)           | 96      | 2.59         | 51.89%     |
| beomi/KoAlpaca-Polyglot-12.8B(open)      | 100     | 2.70         | 54.05%     |
| **komt-llama2-7b-v1 (open)(ours)**       | **117** | **3.16**     | **63.24%** |
| **komt-llama2-13b-v1  (open)(ours)**     | **129** | **3.48**     | **69.72%** |
| **komt-llama-30b-v1  (open)(ours)**      | **129** | **3.16**     | **63.24%** |
| **komt-mistral-7b-v1  (open)(ours)**     | **131** | **3.54**     | **70.81%** |
| **komt-mistral-7b-v1-dpo  (open)(ours)** | **142** | **3.83**     | **76.75%** |