File size: 3,209 Bytes
77e515c
 
 
 
 
 
 
51dabba
 
 
77e515c
51dabba
 
 
77e515c
 
 
 
 
 
 
 
 
 
 
 
 
51dabba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77e515c
 
51dabba
77e515c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51dabba
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
base_model:
- maywell/kiqu-70b
library_name: transformers
tags:
- mergekit
- merge
license: cc-by-sa-4.0
language:
- ko
---
# Megakiqu-120b
<img src="./megakiqu.jpg" alt="megakiqu-120B" width="390"/>
MegaDolphin, Venus๊ณผ ๊ฐ™์€ passthrough method๋กœ ํ™•์žฅ๋œ ๋ชจ๋ธ.

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details
### Merge Method

This model was merged using the passthrough merge method.

### Models Merged

The following models were included in the merge:
* [maywell/kiqu-70b](https://huggingface.co/maywell/kiqu-70b)

## Original Model Card
# **kiqu-70b** [(Arena Leaderboard)](https://huggingface.co/spaces/instructkr/ko-chatbot-arena-leaderboard)


**kiqu-70b** is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using **Korean** datasets.

Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. โ€” leaked early version Mistral-Medium

๋ณธ ๋ชจ๋ธ **kiqu-70b**๋Š” Miqu-70B-Alpaca-DPO ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ **ํ•œ๊ตญ์–ด** ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ SFT+DPO ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜์—ฌ ์ œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฒ ์ด์Šค ๋ชจ๋ธ์ธ miqu-1-70b ๋ชจ๋ธ์ด ๋ฏธ์ŠคํŠธ๋ž„-๋ฏธ๋””์›€์˜ ์ดˆ๊ธฐ ์œ ์ถœ ๋ฒ„์ „์ด๊ธฐ์— ์ƒ์—…์  ์‚ฌ์šฉ์— ๋Œ€ํ•œ risk๋Š” ๋ณธ์ธ์—๊ฒŒ ์žˆ์Šต๋‹ˆ๋‹ค.

Beside that this model follows **cc-by-sa-4.0**

๋ณธ ๋ชจ๋ธ ์ž์ฒด๋กœ์„œ๋Š” **cc-by-sa-4.0**์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

# **Model Details**

**Base Model**  
miqu-1-70b (Early Mistral-Medium)

**Instruction format**

It follows **Mistral** format.
Giving few-shots to model is highly recommended

๋ณธ ๋ชจ๋ธ์€ ๋ฏธ์ŠคํŠธ๋ž„ ํฌ๋งท์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.
few-shot ์‚ฌ์šฉ์„ ์ ๊ทน ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
```
[INST] {instruction}
[/INST] {output}
```

Multi-shot
```
[INST] {instruction}
[/INST] {output}
[INST] {instruction}
[/INST] {output}
[INST] {instruction}
[/INST] {output}
.
.
.
```

**Recommended Template** - 1-shot with system prompt
```
๋„ˆ๋Š” kiqu-70B๋ผ๋Š” ํ•œ๊ตญ์–ด์— ํŠนํ™”๋œ ์–ธ์–ด๋ชจ๋ธ์ด์•ผ. ๊น”๋”ํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋Œ€๋‹ตํ•ด์ค˜!
[INST] ์•ˆ๋…•?
[/INST] ์•ˆ๋…•ํ•˜์„ธ์š”! ๋ฌด์—‡์„ ๋„์™€๋“œ๋ฆด๊นŒ์š”? ์งˆ๋ฌธ์ด๋‚˜ ๊ถ๊ธˆํ•œ ์ ์ด ์žˆ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ ๋ง์”€ํ•ด์ฃผ์„ธ์š”.
[INST] {instruction}
[/INST]
```

Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template.

[/INST] ๋’ค์— ๋„์–ด์“ฐ๊ธฐ๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ธํผ๋Ÿฐ์Šค(์ถ”๋ก )๊ณผ์ •์—์„œ๋Š” ์ฑ— ํ…œํ”Œ๋ฆฟ์— ๋„์–ด์“ฐ๊ธฐ๋ฅผ ์ œ์™ธํ•˜๋Š” ๊ฒƒ์„ ์ ๊ทน ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

### Configuration

The following mergekit's YAML configuration was used to produce this model:

```yaml
dtype: bfloat16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 20]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [10, 30]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [20, 40]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [30, 50]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [40, 60]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [50, 70]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [60, 80]
    model: maywell/kiqu-70b
```