Herman555
/

PiVoT-0.1-Evil-a-AshhLimaRP-Mistral-7B-GGUF

@@ -45,23 +45,31 @@ Consider Support me making these model alone: https://www.buymeacoffee.com/mwell
 Contact me on Telegram: https://t.me/AlzarTakkarsen
 ---
-# LimaRP-Mistral-7B (Alpaca, flipped instruction experiment)
-This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
-about 2000 training samples _up to_ 9k tokens length. The second training epoch used a differently arranged
-system instruction.
-For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
-Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
-roleplaying chat model intended to replicate the experience of 1-on-1 roleplay on Internet forums. Short-form,
-IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
-only manually picked and slightly edited RP conversations with persona and scenario data.
 ## Prompt format
-Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
-with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
-model outputs. While Alpaca wasn't originally intended for multi-turn responses, in practice this
-is not a problem; the format follows a pattern already used by other models.
 ```
 ### Instruction:
@@ -130,27 +138,32 @@ your desired response length:
 ## Text generation settings
 These settings could be a good general starting point:
-- TFS = 0.92
 - Temperature = 0.70
-- Repetition penalty = ~1.1
 - Repetition penalty range = ~2048
 - top-k = 0 (disabled)
 - top-p = 1 (disabled)
 ## Training procedure
 [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
-on 4x NVidia A40 GPUs.
 The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
 ### Training hyperparameters
-Although 1 training epoch was used, the underlying data comprised data repeated twice
-in slightly different formats.
-- learning_rate: 0.0003
-- lr_scheduler: constant_with_warmup
-- noisy_embedding_alpha: 5
-- num_epochs: 1
 - sequence_len: 8750
 - lora_r: 256
 - lora_alpha: 16
@@ -161,15 +174,21 @@ in slightly different formats.
 - tf32: True
 - load_in_8bit: True
 - adapter: lora
-- micro_batch_size: 1
-- gradient_accumulation_steps: 1
 - warmup_steps: 10
 - optimizer: adamw_torch
 - flash_attention: true
 - sample_packing: true
 - pad_to_sequence_len: true
-Using 4 GPUs, the effective global batch size would have been 4.
-### Training loss graph
-![Train loss](https://files.catbox.moe/0pj84w.png)

 Contact me on Telegram: https://t.me/AlzarTakkarsen
 ---
+# AshhLimaRP-Mistral-7B (Alpaca, v1)
+This is a version of LimaRP with 2000 training samples _up to_ about 9k tokens length
+finetuned on [Ashhwriter-Mistral-7B](https://huggingface.co/lemonilia/Ashhwriter-Mistral-7B).
+LimaRP is a longform-oriented, novel-style roleplaying chat model intended to replicate the experience
+of 1-on-1 roleplay on Internet forums. Short-form, IRC/Discord-style RP (aka "Markdown format")
+is not supported. The model does not include instruction tuning, only manually picked and
+slightly edited RP conversations with persona and scenario data.
+Ashhwriter, the base, is a model entirely finetuned on human-written lewd stories.
+## Available versions
+- Float16 HF weights
+- LoRA Adapter ([adapter_config.json](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_config.json) and [adapter_model.bin](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_model.bin))
+- [4bit AWQ](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/tree/main/AWQ)
+- [Q4_K_M GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q4_K_M.gguf)
+- [Q6_K GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q6_K.gguf)
 ## Prompt format
+[Extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
+with `### Instruction:`, `### Input:` immediately preceding user inputs and `### Response:`
+immediately preceding model outputs. While Alpaca wasn't originally intended for multi-turn
+responses, in practice this is not a problem; the format follows a pattern already used by
+other models.
 ```
 ### Instruction:
 ## Text generation settings
 These settings could be a good general starting point:
+- TFS = 0.90
 - Temperature = 0.70
+- Repetition penalty = ~1.11
 - Repetition penalty range = ~2048
 - top-k = 0 (disabled)
 - top-p = 1 (disabled)
 ## Training procedure
 [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
+on 2x NVidia A40 GPUs.
 The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
 ### Training hyperparameters
+A lower learning rate than usual was employed. Due to an unforeseen issue the training
+was cut short and as a result 3 epochs were trained instead of the planned 4. Using 2 GPUs,
+the effective global batch size would have been 16.
+Training was continued from the most recent LoRA adapter from Ashhwriter, using the same
+LoRA R and LoRA alpha.
+- lora_model_dir: /home/anon/bin/axolotl/OUT_mistral-stories/checkpoint-6000/
+- learning_rate: 0.00005
+- lr_scheduler: cosine
+- noisy_embedding_alpha: 3.5
+- num_epochs: 4
 - sequence_len: 8750
 - lora_r: 256
 - lora_alpha: 16
 - tf32: True
 - load_in_8bit: True
 - adapter: lora
+- micro_batch_size: 2
+- optimizer: adamw_bnb_8bit
 - warmup_steps: 10
 - optimizer: adamw_torch
 - flash_attention: true
 - sample_packing: true
 - pad_to_sequence_len: true
+### Loss graphs
+Values are higher than typical because the training is performed on the entire
+sample, similar to unsupervised finetuning.
+#### Train loss
+![Train loss](https://files.catbox.moe/ovw8c7.png)
+#### Eval loss
+![Eval loss](https://files.catbox.moe/yp7o0h.png)