|
--- |
|
license: other |
|
tags: |
|
- llama-2 |
|
--- |
|
# Model Card: Pygmalion LRP Grad L2 7B |
|
This model uses [Pygmalion 2 7B](https://huggingface.co/PygmalionAI/pygmalion-2-7b) as a base and merged with LimaRP v1(52%) Lora customized with Metharme format |
|
|
|
This merge of Lora with Model was done with this [script](https://github.com/zarakiquemparte/zaraki-tools/blob/main/apply-lora-weight-ltl.py) |
|
|
|
- Credits to [Suikamelon](https://huggingface.co/lemonilia) for the LimaRP dataset |
|
- Credits to [Pygmalion AI](https://huggingface.co/PygmalionAI) for the base model |
|
|
|
|
|
## Weights of Lora merge: |
|
|
|
``` |
|
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 |
|
``` |
|
|
|
## Prompting |
|
|
|
The model has been trained on prompts using three different roles, which are denoted by the following tokens: `<|system|>`, `<|user|>` and `<|model|>`. |
|
|
|
The `<|system|>` prompt can be used to inject out-of-channel information behind the scenes, while the `<|user|>` prompt should be used to indicate user input. |
|
The `<|model|>` token should then be used to indicate that the model should generate a response. These tokens can happen multiple times and be chained up to |
|
form a conversation history. |
|
|
|
### Prompting example |
|
|
|
The system prompt has been designed to allow the model to "enter" various modes and dictate the reply length. Here's an example: |
|
|
|
``` |
|
<|system|>Enter RP mode. Pretend to be {{char}} whose persona follows: |
|
{{persona}} |
|
|
|
You shall reply to the user while staying in character, and generate long responses. |
|
``` |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope. |
|
|
|
As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that |
|
are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. |
|
Outputs might often be factually wrong or misleading. |
|
|
|
## Training Details |
|
|
|
This model use LimaRP by [Suikamelon](https://huggingface.co/lemonilia) dataset converted to metharme prompt format trained using [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and the lora merge was applied in the tool mentioned above |
|
|
|
## Training Hyperparameters |
|
|
|
``` |
|
load_in_8bit: true |
|
adapter: lora |
|
lora_r: 8 |
|
lora_alpha: 16 |
|
lora_dropout: 0.01 |
|
gradient_accumulation_steps: 1 |
|
micro_batch_size: 1 |
|
num_epochs: 3 |
|
optimizer: adamw_torch |
|
lr_scheduler: cosine |
|
learning_rate: 0.000065 |
|
bf16: true |
|
tf32: true |
|
``` |
|
|
|
## Environmental Impact |
|
Finetuning the LimaRP Lora on 1 x NVIDIA L40 takes about 1h45m |
|
|