Update README.md
Browse files
README.md
CHANGED
@@ -45,23 +45,31 @@ Consider Support me making these model alone: https://www.buymeacoffee.com/mwell
|
|
45 |
Contact me on Telegram: https://t.me/AlzarTakkarsen
|
46 |
|
47 |
---
|
48 |
-
#
|
49 |
|
50 |
-
This is a version of LimaRP
|
51 |
-
|
52 |
-
system instruction.
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
## Prompt format
|
61 |
-
|
62 |
-
with `### Input:` immediately preceding user inputs and `### Response:`
|
63 |
-
model outputs. While Alpaca wasn't originally intended for multi-turn
|
64 |
-
is not a problem; the format follows a pattern already used by
|
|
|
65 |
|
66 |
```
|
67 |
### Instruction:
|
@@ -130,27 +138,32 @@ your desired response length:
|
|
130 |
## Text generation settings
|
131 |
These settings could be a good general starting point:
|
132 |
|
133 |
-
- TFS = 0.
|
134 |
- Temperature = 0.70
|
135 |
-
- Repetition penalty = ~1.
|
136 |
- Repetition penalty range = ~2048
|
137 |
- top-k = 0 (disabled)
|
138 |
- top-p = 1 (disabled)
|
139 |
|
140 |
## Training procedure
|
141 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
142 |
-
on
|
143 |
|
144 |
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
145 |
|
146 |
### Training hyperparameters
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
|
153 |
-
|
|
|
|
|
|
|
|
|
|
|
154 |
- sequence_len: 8750
|
155 |
- lora_r: 256
|
156 |
- lora_alpha: 16
|
@@ -161,15 +174,21 @@ in slightly different formats.
|
|
161 |
- tf32: True
|
162 |
- load_in_8bit: True
|
163 |
- adapter: lora
|
164 |
-
- micro_batch_size:
|
165 |
-
-
|
166 |
- warmup_steps: 10
|
167 |
- optimizer: adamw_torch
|
168 |
- flash_attention: true
|
169 |
- sample_packing: true
|
170 |
- pad_to_sequence_len: true
|
171 |
|
172 |
-
Using 4 GPUs, the effective global batch size would have been 4.
|
173 |
|
174 |
-
###
|
175 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
Contact me on Telegram: https://t.me/AlzarTakkarsen
|
46 |
|
47 |
---
|
48 |
+
# AshhLimaRP-Mistral-7B (Alpaca, v1)
|
49 |
|
50 |
+
This is a version of LimaRP with 2000 training samples _up to_ about 9k tokens length
|
51 |
+
finetuned on [Ashhwriter-Mistral-7B](https://huggingface.co/lemonilia/Ashhwriter-Mistral-7B).
|
|
|
52 |
|
53 |
+
LimaRP is a longform-oriented, novel-style roleplaying chat model intended to replicate the experience
|
54 |
+
of 1-on-1 roleplay on Internet forums. Short-form, IRC/Discord-style RP (aka "Markdown format")
|
55 |
+
is not supported. The model does not include instruction tuning, only manually picked and
|
56 |
+
slightly edited RP conversations with persona and scenario data.
|
57 |
+
|
58 |
+
Ashhwriter, the base, is a model entirely finetuned on human-written lewd stories.
|
59 |
+
|
60 |
+
## Available versions
|
61 |
+
- Float16 HF weights
|
62 |
+
- LoRA Adapter ([adapter_config.json](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_config.json) and [adapter_model.bin](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_model.bin))
|
63 |
+
- [4bit AWQ](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/tree/main/AWQ)
|
64 |
+
- [Q4_K_M GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q4_K_M.gguf)
|
65 |
+
- [Q6_K GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q6_K.gguf)
|
66 |
|
67 |
## Prompt format
|
68 |
+
[Extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
69 |
+
with `### Instruction:`, `### Input:` immediately preceding user inputs and `### Response:`
|
70 |
+
immediately preceding model outputs. While Alpaca wasn't originally intended for multi-turn
|
71 |
+
responses, in practice this is not a problem; the format follows a pattern already used by
|
72 |
+
other models.
|
73 |
|
74 |
```
|
75 |
### Instruction:
|
|
|
138 |
## Text generation settings
|
139 |
These settings could be a good general starting point:
|
140 |
|
141 |
+
- TFS = 0.90
|
142 |
- Temperature = 0.70
|
143 |
+
- Repetition penalty = ~1.11
|
144 |
- Repetition penalty range = ~2048
|
145 |
- top-k = 0 (disabled)
|
146 |
- top-p = 1 (disabled)
|
147 |
|
148 |
## Training procedure
|
149 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
150 |
+
on 2x NVidia A40 GPUs.
|
151 |
|
152 |
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
153 |
|
154 |
### Training hyperparameters
|
155 |
+
A lower learning rate than usual was employed. Due to an unforeseen issue the training
|
156 |
+
was cut short and as a result 3 epochs were trained instead of the planned 4. Using 2 GPUs,
|
157 |
+
the effective global batch size would have been 16.
|
158 |
+
|
159 |
+
Training was continued from the most recent LoRA adapter from Ashhwriter, using the same
|
160 |
+
LoRA R and LoRA alpha.
|
161 |
+
|
162 |
+
- lora_model_dir: /home/anon/bin/axolotl/OUT_mistral-stories/checkpoint-6000/
|
163 |
+
- learning_rate: 0.00005
|
164 |
+
- lr_scheduler: cosine
|
165 |
+
- noisy_embedding_alpha: 3.5
|
166 |
+
- num_epochs: 4
|
167 |
- sequence_len: 8750
|
168 |
- lora_r: 256
|
169 |
- lora_alpha: 16
|
|
|
174 |
- tf32: True
|
175 |
- load_in_8bit: True
|
176 |
- adapter: lora
|
177 |
+
- micro_batch_size: 2
|
178 |
+
- optimizer: adamw_bnb_8bit
|
179 |
- warmup_steps: 10
|
180 |
- optimizer: adamw_torch
|
181 |
- flash_attention: true
|
182 |
- sample_packing: true
|
183 |
- pad_to_sequence_len: true
|
184 |
|
|
|
185 |
|
186 |
+
### Loss graphs
|
187 |
+
Values are higher than typical because the training is performed on the entire
|
188 |
+
sample, similar to unsupervised finetuning.
|
189 |
+
|
190 |
+
#### Train loss
|
191 |
+
![Train loss](https://files.catbox.moe/ovw8c7.png)
|
192 |
+
|
193 |
+
#### Eval loss
|
194 |
+
![Eval loss](https://files.catbox.moe/yp7o0h.png)
|