Triangle104 commited on
Commit
703c5f8
·
verified ·
1 Parent(s): 38b0392

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -22,6 +22,73 @@ base_model: crestf411/L3.1-8B-Slush-v1.1
22
  This model was converted to GGUF format from [`crestf411/L3.1-8B-Slush-v1.1`](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
23
  Refer to the [original model card](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) for more details on the model.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Use with llama.cpp
26
  Install llama.cpp through brew (works on Mac and Linux)
27
 
 
22
  This model was converted to GGUF format from [`crestf411/L3.1-8B-Slush-v1.1`](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
23
  Refer to the [original model card](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) for more details on the model.
24
 
25
+ ---
26
+ Model details:
27
+ -
28
+ Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.
29
+
30
+ This is an initial experiment done on the at-this-point-infamous Llama 3.1 8B model, in an attempt to retain its smartness while addressing its abysmal lack of imagination/creativity. As always, feedback is welcome, and begone if you demand perfection.
31
+
32
+ The second stage, like the Sunfall series, follows the Silly Tavern preset, so ymmv in particular if you use some other tool and/or preset.
33
+
34
+ This update (v1.1) addresses some of the feedback from the first iteration by ramping down the training parameters, and also introduces a custom merge using mergekit.
35
+
36
+ Parameter suggestions:
37
+ -
38
+ I did all my testing with temp 1, min-p 0.1, DRY 0.8. I enabled XTC at higher contexts.
39
+
40
+ Training details:
41
+ -
42
+ Stage 1 (continued pretraining)
43
+ Target: meta-llama/Llama-3.1-8B (resulting LoRA merged into meta-llama/Llama-3.1-8B-Instruct)
44
+ LoRA dropout 0.5 (motivation)
45
+ LoRA rank 64, alpha 128 (motivation)
46
+ LR cosine 4e-6
47
+ LoRA+ with LR Ratio: 15
48
+ Context size: 16384
49
+ Gradient accumulation steps: 4
50
+ Epochs: 1
51
+ Stage 2 (fine tune)
52
+ Target: Stage 1 model
53
+ LoRA dropout 0.5
54
+ LoRA rank 32, alpha 64
55
+ LR cosine 5e-6 (min 5e-7)
56
+ LoRA+ with LR Ratio: 15
57
+ Context size: 16384
58
+ Gradient accumulation steps: 4
59
+ Epochs: 2
60
+
61
+ Merge Method
62
+ -
63
+ This model was merged using the TIES merge method using meta-llama/Llama-3.1-8B as a base.
64
+ Configuration
65
+
66
+ The following YAML configuration was used to produce this model:
67
+
68
+ models:
69
+ - model: stage1-on-instruct
70
+ parameters:
71
+ weight: 1.5
72
+ density: 1
73
+ - model: stage2-on-stage1
74
+ parameters:
75
+ weight: 1.5
76
+ density: 1
77
+ - model: meta-llama/Llama-3.1-8B-Instruct
78
+ parameters:
79
+ weight: 1
80
+ density: 1
81
+ merge_method: ties
82
+ base_model: meta-llama/Llama-3.1-8B
83
+ parameters:
84
+ weight: 1
85
+ density: 1
86
+ normalize: true
87
+ int8_mask: true
88
+ tokenizer_source: meta-llama/Llama-3.1-8B-Instruct
89
+ dtype: bfloat16
90
+
91
+ ---
92
  ## Use with llama.cpp
93
  Install llama.cpp through brew (works on Mac and Linux)
94