Herman555 commited on
Commit
fd02829
1 Parent(s): 41614b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -28
README.md CHANGED
@@ -45,23 +45,31 @@ Consider Support me making these model alone: https://www.buymeacoffee.com/mwell
45
  Contact me on Telegram: https://t.me/AlzarTakkarsen
46
 
47
  ---
48
- # LimaRP-Mistral-7B (Alpaca, flipped instruction experiment)
49
 
50
- This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
51
- about 2000 training samples _up to_ 9k tokens length. The second training epoch used a differently arranged
52
- system instruction.
53
 
54
- For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
55
- Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
56
- roleplaying chat model intended to replicate the experience of 1-on-1 roleplay on Internet forums. Short-form,
57
- IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
58
- only manually picked and slightly edited RP conversations with persona and scenario data.
 
 
 
 
 
 
 
 
59
 
60
  ## Prompt format
61
- Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
62
- with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
63
- model outputs. While Alpaca wasn't originally intended for multi-turn responses, in practice this
64
- is not a problem; the format follows a pattern already used by other models.
 
65
 
66
  ```
67
  ### Instruction:
@@ -130,27 +138,32 @@ your desired response length:
130
  ## Text generation settings
131
  These settings could be a good general starting point:
132
 
133
- - TFS = 0.92
134
  - Temperature = 0.70
135
- - Repetition penalty = ~1.1
136
  - Repetition penalty range = ~2048
137
  - top-k = 0 (disabled)
138
  - top-p = 1 (disabled)
139
 
140
  ## Training procedure
141
  [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
142
- on 4x NVidia A40 GPUs.
143
 
144
  The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
145
 
146
  ### Training hyperparameters
147
- Although 1 training epoch was used, the underlying data comprised data repeated twice
148
- in slightly different formats.
149
-
150
- - learning_rate: 0.0003
151
- - lr_scheduler: constant_with_warmup
152
- - noisy_embedding_alpha: 5
153
- - num_epochs: 1
 
 
 
 
 
154
  - sequence_len: 8750
155
  - lora_r: 256
156
  - lora_alpha: 16
@@ -161,15 +174,21 @@ in slightly different formats.
161
  - tf32: True
162
  - load_in_8bit: True
163
  - adapter: lora
164
- - micro_batch_size: 1
165
- - gradient_accumulation_steps: 1
166
  - warmup_steps: 10
167
  - optimizer: adamw_torch
168
  - flash_attention: true
169
  - sample_packing: true
170
  - pad_to_sequence_len: true
171
 
172
- Using 4 GPUs, the effective global batch size would have been 4.
173
 
174
- ### Training loss graph
175
- ![Train loss](https://files.catbox.moe/0pj84w.png)
 
 
 
 
 
 
 
 
45
  Contact me on Telegram: https://t.me/AlzarTakkarsen
46
 
47
  ---
48
+ # AshhLimaRP-Mistral-7B (Alpaca, v1)
49
 
50
+ This is a version of LimaRP with 2000 training samples _up to_ about 9k tokens length
51
+ finetuned on [Ashhwriter-Mistral-7B](https://huggingface.co/lemonilia/Ashhwriter-Mistral-7B).
 
52
 
53
+ LimaRP is a longform-oriented, novel-style roleplaying chat model intended to replicate the experience
54
+ of 1-on-1 roleplay on Internet forums. Short-form, IRC/Discord-style RP (aka "Markdown format")
55
+ is not supported. The model does not include instruction tuning, only manually picked and
56
+ slightly edited RP conversations with persona and scenario data.
57
+
58
+ Ashhwriter, the base, is a model entirely finetuned on human-written lewd stories.
59
+
60
+ ## Available versions
61
+ - Float16 HF weights
62
+ - LoRA Adapter ([adapter_config.json](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_config.json) and [adapter_model.bin](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_model.bin))
63
+ - [4bit AWQ](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/tree/main/AWQ)
64
+ - [Q4_K_M GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q4_K_M.gguf)
65
+ - [Q6_K GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q6_K.gguf)
66
 
67
  ## Prompt format
68
+ [Extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
69
+ with `### Instruction:`, `### Input:` immediately preceding user inputs and `### Response:`
70
+ immediately preceding model outputs. While Alpaca wasn't originally intended for multi-turn
71
+ responses, in practice this is not a problem; the format follows a pattern already used by
72
+ other models.
73
 
74
  ```
75
  ### Instruction:
 
138
  ## Text generation settings
139
  These settings could be a good general starting point:
140
 
141
+ - TFS = 0.90
142
  - Temperature = 0.70
143
+ - Repetition penalty = ~1.11
144
  - Repetition penalty range = ~2048
145
  - top-k = 0 (disabled)
146
  - top-p = 1 (disabled)
147
 
148
  ## Training procedure
149
  [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
150
+ on 2x NVidia A40 GPUs.
151
 
152
  The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
153
 
154
  ### Training hyperparameters
155
+ A lower learning rate than usual was employed. Due to an unforeseen issue the training
156
+ was cut short and as a result 3 epochs were trained instead of the planned 4. Using 2 GPUs,
157
+ the effective global batch size would have been 16.
158
+
159
+ Training was continued from the most recent LoRA adapter from Ashhwriter, using the same
160
+ LoRA R and LoRA alpha.
161
+
162
+ - lora_model_dir: /home/anon/bin/axolotl/OUT_mistral-stories/checkpoint-6000/
163
+ - learning_rate: 0.00005
164
+ - lr_scheduler: cosine
165
+ - noisy_embedding_alpha: 3.5
166
+ - num_epochs: 4
167
  - sequence_len: 8750
168
  - lora_r: 256
169
  - lora_alpha: 16
 
174
  - tf32: True
175
  - load_in_8bit: True
176
  - adapter: lora
177
+ - micro_batch_size: 2
178
+ - optimizer: adamw_bnb_8bit
179
  - warmup_steps: 10
180
  - optimizer: adamw_torch
181
  - flash_attention: true
182
  - sample_packing: true
183
  - pad_to_sequence_len: true
184
 
 
185
 
186
+ ### Loss graphs
187
+ Values are higher than typical because the training is performed on the entire
188
+ sample, similar to unsupervised finetuning.
189
+
190
+ #### Train loss
191
+ ![Train loss](https://files.catbox.moe/ovw8c7.png)
192
+
193
+ #### Eval loss
194
+ ![Eval loss](https://files.catbox.moe/yp7o0h.png)