Triangle104 commited on
Commit
db99b50
·
verified ·
1 Parent(s): 4bcf092

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +268 -0
README.md CHANGED
@@ -21,6 +21,274 @@ tags:
21
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
22
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) for more details on the model.
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Use with llama.cpp
25
  Install llama.cpp through brew (works on Mac and Linux)
26
 
 
21
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
22
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) for more details on the model.
23
 
24
+
25
+
26
+
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ ---
35
+
36
+
37
+ Model details:
38
+
39
+
40
+ -
41
+
42
+
43
+
44
+
45
+
46
+
47
+
48
+
49
+ A RP/storywriting
50
+ specialist model, full-parameter finetune of Qwen2.5-7B on mixture of
51
+ synthetic and natural data.
52
+
53
+
54
+
55
+
56
+
57
+
58
+
59
+
60
+ It uses Celeste 70B
61
+ 0.1 data mixture, greatly expanding it to improve
62
+
63
+
64
+
65
+ versatility,
66
+ creativity and "flavor" of the resulting model.
67
+
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+ Version 0.1 notes:
76
+
77
+
78
+ Dataset was deduped
79
+ and cleaned from
80
+
81
+
82
+
83
+ version 0.0, and
84
+ learning rate was adjusted. Resulting model seems to be
85
+
86
+
87
+ stabler, and 0.0
88
+ problems with handling short inputs and min_p sampling
89
+
90
+
91
+ seem to be mostly
92
+ gone.
93
+
94
+
95
+
96
+
97
+
98
+
99
+
100
+
101
+ Will be retrained
102
+ once more, because this run crashed around e1.2 (out
103
+
104
+
105
+ of 3) (thanks,
106
+ DeepSpeed, really appreciate it), and it's still
107
+
108
+
109
+
110
+ somewhat
111
+ undertrained as a result.
112
+
113
+
114
+
115
+
116
+
117
+
118
+
119
+
120
+ Prompt format is
121
+ ChatML.
122
+
123
+
124
+
125
+
126
+
127
+
128
+
129
+
130
+ Recommended sampler
131
+ values:
132
+
133
+
134
+
135
+
136
+
137
+
138
+
139
+
140
+ Temperature: 0.87
141
+
142
+
143
+ Top-P: 0.81
144
+
145
+
146
+ Repetition Penalty:
147
+ 1.03
148
+
149
+
150
+
151
+
152
+
153
+
154
+
155
+
156
+ Model appears to
157
+ prefer lower temperatures (at least 0.9 and lower). Min-P seems to
158
+ work now, as well.
159
+
160
+
161
+
162
+
163
+
164
+
165
+
166
+
167
+ Recommended
168
+ SillyTavern presets (via CalamitousFelicitousness):
169
+
170
+
171
+
172
+
173
+
174
+
175
+
176
+
177
+ Context
178
+
179
+
180
+
181
+
182
+
183
+
184
+
185
+
186
+ Instruct and System
187
+ Prompt
188
+
189
+
190
+
191
+
192
+
193
+
194
+
195
+
196
+ Training data:
197
+
198
+
199
+
200
+
201
+
202
+
203
+
204
+
205
+ Celeste 70B 0.1 data
206
+ mixture minus Opus Instruct subset. See that model's card for
207
+ details.
208
+
209
+
210
+ Kalomaze's
211
+ Opus_Instruct_25k dataset, filtered for refusals.
212
+
213
+
214
+ A subset (1k rows)
215
+ of ChatGPT-4o-WritingPrompts by Gryphe
216
+
217
+
218
+ A subset (2k rows)
219
+ of Sonnet3.5-Charcards-Roleplay by Gryphe
220
+
221
+
222
+ A cleaned subset
223
+ (~3k rows) of shortstories_synthlabels by Auri
224
+
225
+
226
+ Synthstruct and
227
+ SynthRP datasets by Epiculous
228
+
229
+
230
+
231
+
232
+
233
+
234
+
235
+
236
+ Training time and
237
+ hardware:
238
+
239
+
240
+
241
+
242
+
243
+
244
+
245
+
246
+ 2 days on 4x3090Ti
247
+ (locally)
248
+
249
+
250
+
251
+
252
+
253
+
254
+
255
+
256
+ Model was trained by
257
+ Kearm and Auri.
258
+
259
+
260
+ Special thanks:
261
+
262
+
263
+ to Gryphe, Lemmy,
264
+ Kalomaze, Nopm and Epiculous for the data
265
+
266
+
267
+ to Alpindale for
268
+ helping with FFT config for Qwen2.5
269
+
270
+
271
+ and to
272
+ InfermaticAI's community for their continued support for our
273
+ endeavors
274
+
275
+
276
+
277
+
278
+
279
+
280
+
281
+
282
+ ---
283
+
284
+
285
+
286
+
287
+
288
+
289
+
290
+
291
+ p { line-height: 115%; margin-bottom: 0.25cm; background: transparent }
292
  ## Use with llama.cpp
293
  Install llama.cpp through brew (works on Mac and Linux)
294