jdpressman/Mistral-Morpheus-7B

First iteration of the default generator LoRa for MiniHF. This model still functions as a base model while writing more coherent text.

Training procedure

This model was trained starting from the MiniHF Mistral SFT evaluator. It was created using the MiniHF Reinforcement Learning From AI Feedback pipeline:

accelerate launch rlaif_generator.py --resume minihf_evaluator_mistral_7b_v0.1 --output-path mistral_h_eval --kl-weight 1.0 --constitution hermes/hermes_constitution.txt --prompts hermes/hermes_prompts.txt --length 256 --batch-size 4 --grad-accum-steps 8

The tuning script was modified to use the AdamW optimizer with weight decay:

opt = optim.AdamW(model.parameters(), lr=1e-5, weight_decay=1e-2, betas=(0.9, 0.98))

This weight decay is based on the observation that RL tuning mode collapse can be undone by interpolating the weights of the base model with that of the RL tuned model. Here the specific recipe was to start from the MiniHF SFT evaluator, then apply weight decay and the KL penalty towards the base model weights to inject entropy back into the policy.

Prompt Bank and Constitution

The prompt bank using during tuning is in the hermes_prompts.txt file found in this repo, the constitution in hermes_constitution.txt

Configuration

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: float16

Framework versions

PEFT 0.5.0
PEFT 0.5.0
PEFT 0.5.0