kuotient commited on
Commit
51dabba
โ€ข
1 Parent(s): 77e515c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -6
README.md CHANGED
@@ -5,9 +5,13 @@ library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
 
 
9
  ---
10
- # kiqu150
 
 
11
 
12
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
 
@@ -21,9 +25,68 @@ This model was merged using the passthrough merge method.
21
  The following models were included in the merge:
22
  * [maywell/kiqu-70b](https://huggingface.co/maywell/kiqu-70b)
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ### Configuration
25
 
26
- The following YAML configuration was used to produce this model:
27
 
28
  ```yaml
29
  dtype: bfloat16
@@ -50,6 +113,4 @@ slices:
50
  - sources:
51
  - layer_range: [60, 80]
52
  model: maywell/kiqu-70b
53
-
54
-
55
- ```
 
5
  tags:
6
  - mergekit
7
  - merge
8
+ license: cc-by-sa-4.0
9
+ language:
10
+ - ko
11
  ---
12
+ # Megakiqu-120b
13
+ <img src="./megakiqu.jpg" alt="megakiqu-120B" width="390"/>
14
+ MegaDolphin, Venus๊ณผ ๊ฐ™์€ passthrough method๋กœ ํ™•์žฅ๋œ ๋ชจ๋ธ.
15
 
16
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
17
 
 
25
  The following models were included in the merge:
26
  * [maywell/kiqu-70b](https://huggingface.co/maywell/kiqu-70b)
27
 
28
+ ## Original Model Card
29
+ # **kiqu-70b** [(Arena Leaderboard)](https://huggingface.co/spaces/instructkr/ko-chatbot-arena-leaderboard)
30
+
31
+
32
+ **kiqu-70b** is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using **Korean** datasets.
33
+
34
+ Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. โ€” leaked early version Mistral-Medium
35
+
36
+ ๋ณธ ๋ชจ๋ธ **kiqu-70b**๋Š” Miqu-70B-Alpaca-DPO ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ **ํ•œ๊ตญ์–ด** ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ SFT+DPO ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜์—ฌ ์ œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
37
+
38
+ ๋ฒ ์ด์Šค ๋ชจ๋ธ์ธ miqu-1-70b ๋ชจ๋ธ์ด ๋ฏธ์ŠคํŠธ๋ž„-๋ฏธ๋””์›€์˜ ์ดˆ๊ธฐ ์œ ์ถœ ๋ฒ„์ „์ด๊ธฐ์— ์ƒ์—…์  ์‚ฌ์šฉ์— ๋Œ€ํ•œ risk๋Š” ๋ณธ์ธ์—๊ฒŒ ์žˆ์Šต๋‹ˆ๋‹ค.
39
+
40
+ Beside that this model follows **cc-by-sa-4.0**
41
+
42
+ ๋ณธ ๋ชจ๋ธ ์ž์ฒด๋กœ์„œ๋Š” **cc-by-sa-4.0**์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.
43
+
44
+ # **Model Details**
45
+
46
+ **Base Model**
47
+ miqu-1-70b (Early Mistral-Medium)
48
+
49
+ **Instruction format**
50
+
51
+ It follows **Mistral** format.
52
+ Giving few-shots to model is highly recommended
53
+
54
+ ๋ณธ ๋ชจ๋ธ์€ ๋ฏธ์ŠคํŠธ๋ž„ ํฌ๋งท์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.
55
+ few-shot ์‚ฌ์šฉ์„ ์ ๊ทน ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
56
+ ```
57
+ [INST] {instruction}
58
+ [/INST] {output}
59
+ ```
60
+
61
+ Multi-shot
62
+ ```
63
+ [INST] {instruction}
64
+ [/INST] {output}
65
+ [INST] {instruction}
66
+ [/INST] {output}
67
+ [INST] {instruction}
68
+ [/INST] {output}
69
+ .
70
+ .
71
+ .
72
+ ```
73
+
74
+ **Recommended Template** - 1-shot with system prompt
75
+ ```
76
+ ๋„ˆ๋Š” kiqu-70B๋ผ๋Š” ํ•œ๊ตญ์–ด์— ํŠนํ™”๋œ ์–ธ์–ด๋ชจ๋ธ์ด์•ผ. ๊น”๋”ํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋Œ€๋‹ตํ•ด์ค˜!
77
+ [INST] ์•ˆ๋…•?
78
+ [/INST] ์•ˆ๋…•ํ•˜์„ธ์š”! ๋ฌด์—‡์„ ๋„์™€๋“œ๋ฆด๊นŒ์š”? ์งˆ๋ฌธ์ด๋‚˜ ๊ถ๊ธˆํ•œ ์ ์ด ์žˆ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ ๋ง์”€ํ•ด์ฃผ์„ธ์š”.
79
+ [INST] {instruction}
80
+ [/INST]
81
+ ```
82
+
83
+ Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template.
84
+
85
+ [/INST] ๋’ค์— ๋„์–ด์“ฐ๊ธฐ๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ธํผ๋Ÿฐ์Šค(์ถ”๋ก )๊ณผ์ •์—์„œ๋Š” ์ฑ— ํ…œํ”Œ๋ฆฟ์— ๋„์–ด์“ฐ๊ธฐ๋ฅผ ์ œ์™ธํ•˜๋Š” ๊ฒƒ์„ ์ ๊ทน ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
86
+
87
  ### Configuration
88
 
89
+ The following mergekit's YAML configuration was used to produce this model:
90
 
91
  ```yaml
92
  dtype: bfloat16
 
113
  - sources:
114
  - layer_range: [60, 80]
115
  model: maywell/kiqu-70b
116
+ ```