crestf411 commited on
Commit
59e15f2
1 Parent(s): 8d6e4eb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - not-for-all-audiences
5
+ - mergekit
6
+ datasets:
7
+ - crestf411/LimaRP-DS
8
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
9
+ - anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system
10
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal-no-system
11
+ - anthracite-org/kalo-opus-instruct-3k-filtered-no-system
12
+ - anthracite-org/nopm_claude_writing_fixed
13
+ base_model:
14
+ - Qwen/Qwen2.5-32B-Instruct
15
+ ---
16
+
17
+ ![slush.jpg](https://huggingface.co/crestf411/L3.1-8B-Slush/resolve/main/slush.jpg?)
18
+
19
+ ([GGUFs](https://huggingface.co/mradermacher/MN-Slush-i1-GGUF))
20
+
21
+ **Slush** is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.
22
+
23
+ This is still early stage. As always, feedback is welcome, and begone if you demand perfection.
24
+
25
+ The second stage, like the *Sunfall* series, follows the Silly Tavern preset (ChatML), so ymmv in particular if you use some other tool and/or preset.
26
+
27
+ **Parameter suggestions:**
28
+
29
+ I did all my testing with temp 1, min-p 0.1, DRY 0.8, but enabled XTC as context grew and/or the model started saying "the same stuff".
30
+
31
+ **Training details:**
32
+
33
+ * Stage 1 (continued pretraining)
34
+ * Target: Qwen/Qwen2.5-32B (resulting LoRA merged into Qwen/Qwen2.5-32B-Instruct)
35
+ * LoRA dropout 0.5 ([motivation](https://arxiv.org/abs/2403.00946))
36
+ * LoRA rank 32, alpha 64 ([motivation](https://arxiv.org/abs/2410.21228))
37
+ * LR cosine 4e-6
38
+ * [LoRA+](https://arxiv.org/abs/2402.12354) with LR Ratio: 15
39
+ * Context size: 8192
40
+ * Gradient accumulation steps: 4
41
+ * Epochs: 1
42
+ * Stage 2 (fine tune)
43
+ * Target: Stage 1 model
44
+ * LoRA dropout 0.5
45
+ * LoRA rank 32, alpha 64
46
+ * LR cosine 5e-6 (min 5e-7)
47
+ * [LoRA+](https://arxiv.org/abs/2402.12354) with LR Ratio: 15
48
+ * Context size: 16384
49
+ * Gradient accumulation steps: 4
50
+ * Epochs: 1
51
+
52
+ ## Merge Details
53
+ ### Merge Method
54
+
55
+ This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method.
56
+
57
+ ### Configuration
58
+
59
+ The following YAML configuration was used to produce this model:
60
+
61
+ ```yaml
62
+ models:
63
+ - model: stage1-model
64
+ parameters:
65
+ weight: 1
66
+ density: 1
67
+ - model: stage2-model
68
+ parameters:
69
+ weight: 1
70
+ density: 1
71
+ - model: Qwen/Qwen2.5-32B-Instruct
72
+ parameters:
73
+ weight: 0.9
74
+ density: 0.9
75
+ merge_method: ties
76
+ base_model: Qwen/Qwen2.5-32B
77
+ parameters:
78
+ weight: 0.9
79
+ density: 0.9
80
+ normalize: true
81
+ int8_mask: true
82
+ tokenizer_source: Qwen/Qwen2.5-32B-Instruct
83
+ dtype: bfloat16
84
+ ```