Triangle104
/

MN-Slush-Q8_0-GGUF

@@ -13,12 +13,88 @@ datasets:
 - anthracite-org/kalo-opus-instruct-3k-filtered-no-system
 - anthracite-org/nopm_claude_writing_fixed
 base_model: crestf411/MN-Slush
 ---
 # Triangle104/MN-Slush-Q8_0-GGUF
 This model was converted to GGUF format from [`crestf411/MN-Slush`](https://huggingface.co/crestf411/MN-Slush) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/crestf411/MN-Slush) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)
@@ -57,4 +133,4 @@ Step 3: Run inference through the main binary.
 or
 ```
 ./llama-server --hf-repo Triangle104/MN-Slush-Q8_0-GGUF --hf-file mn-slush-q8_0.gguf -c 2048
-```

 - anthracite-org/kalo-opus-instruct-3k-filtered-no-system
 - anthracite-org/nopm_claude_writing_fixed
 base_model: crestf411/MN-Slush
+license: apache-2.0
 ---
 # Triangle104/MN-Slush-Q8_0-GGUF
 This model was converted to GGUF format from [`crestf411/MN-Slush`](https://huggingface.co/crestf411/MN-Slush) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/crestf411/MN-Slush) for more details on the model.
+---
+Model details:
+-
+Slush is a two-stage model trained with high LoRA
+dropout, where stage 1 is a pretraining continuation on the base model,
+aimed at boosting the model's creativity and writing capabilities. This
+is then merged into the instruction tune model, and stage 2 is a fine
+tuning step on top of this to further enhance its roleplaying
+capabilities and/or to repair any damage caused in the stage 1 merge.
+This is still early stage. As always, feedback is welcome, and begone if you demand perfection.
+The second stage, like the Sunfall series, follows the Silly
+ Tavern preset (Mistral V2 & V3, though V3-Tekken works fine), so
+ymmv in particular if you use some other tool and/or preset.
+Parameter suggestions:
+-
+I did all my testing with temp 1, min-p 0.1, DRY 0.8.
+Training details:
+-
+Stage 1 (continued pretraining)
+Target: mistralai/Mistral-Nemo-Base-2407 (resulting LoRA merged into mistralai/Mistral-Nemo-Instruct-2407)
+LoRA dropout 0.5 (motivation)
+LoRA rank 64, alpha 128 (motivation)
+LR cosine 4e-6
+LoRA+ with LR Ratio: 15
+Context size: 16384
+Gradient accumulation steps: 4
+Epochs: 1
+Stage 2 (fine tune)
+Target: Stage 1 model
+LoRA dropout 0.5
+LoRA rank 32, alpha 64
+LR cosine 5e-6 (min 5e-7)
+LoRA+ with LR Ratio: 15
+Context size: 16384
+Gradient accumulation steps: 4
+Epochs: 2
+Merge Method
+-
+This model was merged using the TIES merge method using mistralai/Mistral-Nemo-Base-2407 as a base.
+Configuration
+-
+The following YAML configuration was used to produce this model:
+models:
+  - model: stage1-on-instruct
+    parameters:
+      weight: 1
+      density: 1
+  - model: stage2-on-stage1
+    parameters:
+      weight: 0.7
+      density: 1
+  - model: mistralai/Mistral-Nemo-Instruct-2407
+    parameters:
+      weight: 1
+      density: 1
+merge_method: ties
+base_model: mistralai/Mistral-Nemo-Base-2407
+parameters:
+  weight: 1
+  density: 1
+  normalize: true
+  int8_mask: true
+tokenizer_source: mistralai/Mistral-Nemo-Instruct-2407
+dtype: bfloat16
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)
 or
 ```
 ./llama-server --hf-repo Triangle104/MN-Slush-Q8_0-GGUF --hf-file mn-slush-q8_0.gguf -c 2048
+```