File size: 7,153 Bytes
0a95e86 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
language:
- en
- ko
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- llama-2-chat
library_name: peft
---
# komt : korean multi task instruction tuning model
![multi task instruction tuning.jpg](https://github.com/davidkim205/komt/assets/16680469/c7f6ade7-247e-4b62-a94f-47e19abea68e)
Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities.
However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively.
This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
## Model Details
* **Model Developers** : davidkim(changyeon kim)
* **Repository** : https://github.com/davidkim205/komt
* **Model Architecture** : The komt-mistral-7b-v1-dpo is is a fine-tuned version of the komt-mistral-7b-v1(original model : Mistral-7B-Instruct-v0.1).
## Dataset
* maywell/ko_Ultrafeedback_binarized
https://huggingface.co/datasets/maywell/ko_Ultrafeedback_binarized
## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2
## Training
Refer https://github.com/davidkim205/komt
## Prompt template: Mistral
```
<s>[INST] {prompt} [/INST]</s>
```
## Usage
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
from transformers import TextStreamer, GenerationConfig
model='davidkim205/komt-mistral-7b-v1'
peft_model_name = 'davidkim205/komt-mistral-7b-v1-dpo'
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
config.base_model_name_or_path =model
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
streamer = TextStreamer(tokenizer)
def gen(x):
generation_config = GenerationConfig(
temperature=0.8,
top_p=0.8,
top_k=100,
max_new_tokens=1024,
early_stopping=True,
do_sample=True,
)
q = f"[INST]{x} [/INST]"
gened = model.generate(
**tokenizer(
q,
return_tensors='pt',
return_token_type_ids=False
).to('cuda'),
generation_config=generation_config,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
streamer=streamer,
)
result_str = tokenizer.decode(gened[0])
start_tag = f"[/INST]"
start_index = result_str.find(start_tag)
if start_index != -1:
result_str = result_str[start_index + len(start_tag):].strip()
return result_str
result = gen('μ μ£Όλλ₯Ό 1λ°2μΌλ‘ νΌμ μ¬ννλ €κ³ νλλ° μ¬ν μ½μ€λ₯Ό λ§λ€μ΄μ€')
print('##########')
print(result)
```
output
```
<s> [INST]μ μ£Όλλ₯Ό 1λ°2μΌλ‘ νΌμ μ¬ννλ €κ³ νλλ° μ¬ν μ½μ€λ₯Ό λ§λ€μ΄μ€ [/INST]μ μ£Όλμ μλ¦λ€μ΄ μμ° κ²½κ΄κ³Ό λ¬Ένμ μ°μ λλΌκ³ μΆλ€λ©΄ λ€μκ³Ό κ°μ μ½μ€λ₯Ό μΆμ²νλ€.
첫 λ²μ§Έ μ½μ€λ μλ¦λ€μ΄ ν΄μ κ²½κ΄κ³Ό ν΄μμ¬κ΅¬λ₯Ό κ°μνκΈ° μν΄ μ μ£Ό μλΆ ν΄μμΌλ‘ μ΄λνλ κ²μ΄λ€. μ μ£Ό μλ΄μμ μΌμͺ½ λ°©ν₯μΌλ‘ νλλ©΄ νλ¦Όν΄μμμ₯, μ±μ°ν΄μμμ₯, λ΄λν΄μμμ₯ λ± μ λͺ
ν ν΄μμμ₯μ κ²½μ ν μ μλ€. μ΄ μ§μμ λ§μ λ°λ€μ λμ ν΄μμμ ν΄μμμ μ¦κΈΈ μ μμΌλ©°, ν΄μμμ₯ μ£Όλ³μλ λ§μ μμμ μ΄ μμ΄ λ°°μμ μ¦κΈΈ μ μλ€. μμͺ½ ν΄μμΌλ‘ μ΄λνλ λμ μ μ£Ό λν μ¬κ³μ λ§μ§μΈ νλΈ μν λ° μλΌλΉ λ± λ§μλ μμμ λ§λ³Ό μ μλ€. μλΆ ν΄μμ λμ λ€μ μ μ£Ό μλ΄λ‘ λμμ€λ λμ μ μ£Ό νΉμ°ν μμ₯μμ μ μ£Ό νΉμ°νμ μ΄ μ μλ€.
λ λ²μ§Έ μ½μ€λ λλΆ ν΄μμ λμ보λ κ²μ΄λ€. μ μ£Ό μλ΄μμ μ€λ₯Έμͺ½ λ°©ν₯μΌλ‘ νλλ©΄ μμ΄μ€ν¬λ¦Ό κ±°λ¦¬μΈ νλ¦Όν΄μμμ₯, μ±μ°ν΄μμμ₯, λ΄λν΄μμμ₯ λ± λ€μ ν λ² μ λͺ
ν ν΄μμμ₯μ κ²½μ ν μ μλ€. μ΄ μ§μμ ν΄μμμ₯ μ£Όλ³μλ λ§μ μμμ μ΄ μμ΄ λ°°μμ μ¦κΈΈ μ μλ€. λλΆ ν΄μμ λμ λ€μ μ μ£Ό μλ΄λ‘ λμμ€λ λμ μ μ£Ό νΉμ°ν μμ₯μμ μ μ£Ό νΉμ°νμ μ΄ μ μλ€. μ΄ μ§μμλ λ§μ μμμ μ΄ μμ΄ λ§μλ μμμ λ§λ³Ό μ μλ€.
μΈ λ²μ§Έ μ½μ€λ μ μ£Ό λ¨λΆλ‘ μ΄λνλ κ²μ΄λ€. μ μ£Ό μλ΄μμ μ€λ₯Έμͺ½ λ°©ν₯μΌλ‘ νλλ©΄ μ μ£Ό λ¨λΆλ‘ μ΄λν μ μλ€. μ΄ μ§μμ νλΌμ° κ΅λ¦½κ³΅μμ΄ μμΉν΄ μμ΄ μμ° κ²½κ΄μ κ°μν μ μλ€. νλΌμ° κ΅λ¦½κ³΅μ λ΄μλ λ€μν μμ° κ²½κ΄κ³Ό μ°μ
κ²½λ‘λ₯Ό μ¦κΈΈ μ μλ νλ°© μ½μ€κ° μλ€. λν, μ μ£Ό λ¨λΆλ λ§μ ν΄μμμ₯κ³Ό 골νμ₯μ΄ μμΉν΄ μμ΄ ν΄μμκ³Ό 골νλ₯Ό μ¦κΈΈ μ μλ€. λ¨λΆλ‘ μ΄λνλ λμ μ μ£Ό νΉμ°ν μμ₯μμ μ μ£Ό νΉμ°νμ μ΄ μ μλ€.
```
## Evaluation
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
| model | score | average(0~5) | percentage |
|------------------------------------------|---------| ------------ |------------|
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
| naver Cue(close) | 140 | 3.78 | 75.67% |
| clova X(close) | 136 | 3.67 | 73.51% |
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
| **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
| **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |
| **komt-mistral-7b-v1-dpo (open)(ours)** | **142** | **3.83** | **76.75%** |
|