komt-mistral-7b-v1 / README.md
davidkim205's picture
Create README.md
de3081b
|
raw
history blame
6.15 kB
---
language:
- en
- ko
pipeline_tag: text-generation
tags:
- finetuned
---
# komt : korean multi task instruction tuning model
![multi task instruction tuning.jpg](https://github.com/davidkim205/komt/assets/16680469/c7f6ade7-247e-4b62-a94f-47e19abea68e)
Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities.
However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively.
This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
## Model Details
* **Model Developers** : davidkim(changyeon kim)
* **Repository** : https://github.com/davidkim205/komt
* **Model Architecture** : The komt-mistral-7b-v1 is is a fine-tuned version of the Mistral-7B-Instruct-v0.1.
## Dataset
korean multi-task instruction dataset
## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2
## Training
Refer https://github.com/davidkim205/komt
## Usage
```
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer, GenerationConfig
model_name='davidkim205/komt-mistral-7b-v1'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer)
def gen(x):
generation_config = GenerationConfig(
temperature=0.8,
top_p=0.8,
top_k=100,
max_new_tokens=1024,
early_stopping=True,
do_sample=True,
)
q = f"[INST]{x} [/INST]"
gened = model.generate(
**tokenizer(
q,
return_tensors='pt',
return_token_type_ids=False
).to('cuda'),
generation_config=generation_config,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
streamer=streamer,
)
result_str = tokenizer.decode(gened[0])
start_tag = f"\n\n### Response: "
start_index = result_str.find(start_tag)
if start_index != -1:
result_str = result_str[start_index + len(start_tag):].strip()
return result_str
print(gen('μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜'))
```
output
```
<s> [INST]μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜ [/INST]1λ°• 2일 μ—¬ν–‰ μ½”μŠ€
제주 μ—¬ν–‰ μ½”μŠ€λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
첫째 λ‚ :
* μ•„μΉ¨: μ œμ£Όμ‹œ μ‹ λΌλ‘œ μ΄λ™ν•˜μ—¬ ν•œλΌμ‚° ꡭ립곡원에 λ„μ°©ν•©λ‹ˆλ‹€. 였λ₯Έμͺ½ 발자ꡭ κ΄‘μž₯μ—μ„œ λˆˆμ— λ„λŠ” λ§₯μ£Όμˆμ„ μ°Ύμ•„ ν•œ λ³‘μ˜ λ§₯μ£Όλ₯Ό λ§ˆμ‹œκ³  κ³„λ‹¨μœΌλ‘œ μ˜¬λΌκ°€ ν•œλΌμ‚° 정상에 λ„μ°©ν•©λ‹ˆλ‹€.
* 점심: μ •μƒμ—μ„œ 계단을 내렀와 쀑도에 μœ„μΉ˜ν•œ 골λͺ©μ—μ„œ μŒμ‹μ„ μ¦κΉλ‹ˆλ‹€. μ “κ°ˆ, 곰탕, 전볡 λ“± λ§›μžˆλŠ” 제주 νŠΉμ‚°λ¬Όμ„ 맛보며 νžλ§μ„ μ¦κΉλ‹ˆλ‹€.
* μ˜€ν›„: μ€‘λ„μ—μ„œ λ‚¨ν•˜ν•˜μ—¬ λ‚¨μ„œμͺ½ 제주둜 μ΄λ™ν•©λ‹ˆλ‹€. λ‚¨μ„œμͺ½ μ œμ£ΌλŠ” μ„¬μ˜ μžμ—° κ²½κ΄€κ³Ό 백사μž₯, μ•„λ¦„λ‹€μš΄ λ°”λ‹€ λ“± μ•„λ¦„λ‹€μš΄ 풍광이 μžˆλŠ” κ³³μž…λ‹ˆλ‹€. μˆ™μ†Œμ— μˆ™λ°•ν•©λ‹ˆλ‹€.
λ‘˜μ§Έ λ‚ :
* μ•„μΉ¨: λ‚¨μ„œμͺ½ μ œμ£Όμ—μ„œ λΆμ„œμͺ½ 제주둜 μ΄λ™ν•©λ‹ˆλ‹€. 이 지역은 ν‘Έλ₯Έ μž”λ”” ν•΄λ³€κ³Ό 투λͺ…ν•œ λ°”λ‹€κ°€ 펼쳐져 μžˆλŠ” μ•„λ¦„λ‹€μš΄ ν’κ΄‘μž…λ‹ˆλ‹€. μˆ™μ†Œμ—μ„œ 아침을 λ¨Ήκ³  λ°”λ‹€λ‘œ ν–₯ν•˜μ—¬ ν•΄λ³€μ—μ„œ νœ΄μ‹μ„ μ·¨ν•©λ‹ˆλ‹€.
* 점심: λΆμ„œμͺ½ 제주의 μ•žλ°”λ‹€μ—μ„œ μ‹œμ›ν•œ λ°”λ‹€λ₯Ό 보며 ν•œ 끼의 해산물을 맛보고 κ³„μ ˆμ— 따라 ν•΄μ‚°λ¬Ό μš”λ¦¬λ₯Ό μ¦κΉλ‹ˆλ‹€.
* μ˜€ν›„: λ‚¨μ„œμͺ½ μ œμ£Όμ—μ„œ μ΄λ™ν•˜μ—¬ μ„œμͺ½ 제주둜 μ΄λ™ν•©λ‹ˆλ‹€. μ„œμͺ½ μ œμ£ΌλŠ” 절벽과 암석 ν•΄λ³€, 수수께끼의 돌 정원 λ“± λ…νŠΉν•œ 경관이 μžˆλŠ” κ³³μž…λ‹ˆλ‹€. μ΄κ³³μ—μ„œλŠ” μ„œμͺ½ 제주의 λŒ€ν‘œμ μΈ λͺ…μ†ŒμΈ μ²­λŸ‰μ‚¬λ₯Ό λ°©λ¬Έν•˜κ³  μ„œμͺ½ 제주의 μ•„λ¦„λ‹€μš΄ 풍광을 κ°μƒν•©λ‹ˆλ‹€.
* 저녁: μ„œμͺ½ μ œμ£Όμ—μ„œ 제주 μ‹œλ‘œ μ΄λ™ν•˜μ—¬ ν•œ 끼의 제주 νŠΉμ‚°λ¬Όμ„ 맛보고 λ„μ°©ν•œ 제주 λ„μ‹¬μ—μ„œ 저녁을 μ¦κΉλ‹ˆλ‹€.
* μ•Όκ°„: 제주 μ‹œμ˜ λ„μ‹¬μ—μ„œ μ•Όκ°„ ν™œλ™μ„ 즐기며 1λ°• 2일의 여행을 λ§ˆλ¬΄λ¦¬ν•©λ‹ˆλ‹€.
μ΄λ ‡κ²Œ 제주λ₯Ό 1λ°• 2일둜 혼자 μ—¬ν–‰ν•˜λ©΄ 제주의 μ•„λ¦„λ‹€μš΄ 풍광, ν‘Έλ₯Έ μž”λ”” ν•΄λ³€, 투λͺ…ν•œ λ°”λ‹€ 등을 κ²½ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
```
## Evaluation
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
| model | score | average(0~5) | percentage |
| --------------------------------------- |---------| ------------ | ---------- |
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
| naver Cue(close) | 140 | 3.78 | 75.67% |
| clova X(close) | 136 | 3.67 | 73.51% |
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
| **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
| **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |