RWKV 14B WizardLM LoRA

The model in this repository was trained for 10.25 hours with a cost of $18.

LoRA Rank: 32
LoRA Alpha: 64
Real Epochs: 3
Learning Rate: 1e-4
Context Length: 1024
Training Tokens: 22,771,425
Training Dataset: WizardLM_alpaca_evol_instruct_70k_unfiltered
RWKV Model License: apache-2.0

This is an unrestricted model. Please be aware that outputs could be extremely harmful, potentially even if the model is not prompted for harmful outputs. Discretion should be advised when deploying the model to make sure you are not exposing yourself to liabilities arising from unwanted or harmful outputs. I am not responsible for anything that happens when you use this model.

The training data may have more restrictive licenses. Depending on your jurisdiction and local laws, it may be unwise to use this model for commercial purposes. It is currently unclear how training data licenses govern trained models and it may be subject to change in the near future.

Preparing Data

Repo: RWKV-v2-RNN-Pile Directory: RWKV-v3

You need to create a file called train.txt. Separate each entry with <|endoftext|>. Here is some example code:

import json

with open("WizardLM_alpaca_evol_instruct_70k_unfiltered.json", "r") as fh:
    data = json.load(fh)
    for item in data:
        if len(item.get("instruction")) > 0 and len(item.get("output")) > 0:
            print(item["instruction"])
            print("\n### Response:", end="")
            print(item["output"])
            print("<|endoftext|>")

Then run:

python prepare_data.py

The resulting file will be train.npy. Keep track of the number of tokens.

Training

Repo: RWKV-LM-LoRA Directory: RWKV-v4neo

Trained using Runpod A100 80 GB instance (Torch 2)

Install dependencies:

apt install screen ncdu htop vim
wget https://huggingface.co/BlinkDL/rwkv-4-pile-14b/resolve/main/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth
# replace import for inf from torch._six with import from math
vim /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/utils.py
vim /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py
pip install pytorch-lightning==1.9.0 deepspeed==0.7.0
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu118
apt install cuda-nvcc-11-8 libcusparse-11-8 libcusparse-dev-11-8 libcublas-dev-11-8 libcublas-11-8 libcusolver-dev-11-8 libcusolver-11-8
apt remove cuda-nvcc-11-6

Run training:

Note:

n_layer and n_embd is dependent on the specifc model you choose.
lora_alpha must be the same in training and the merge_lora.py command.
epoch_count is calculated from tokens / (ctx_len * micro_bsz * epoch_steps) * actual_epochs
Make sure your checkpoints folder exists.

python3 train.py \
 --load_model ./RWKV-4-Pile-3B-20221110-ctx4096.pth \
 --proj_dir ./checkpoints-wizardlm \
 --data_file ./train.npy \
 --data_type numpy \
 --vocab_size 50277 \
 --ctx_len 1024 \
 --epoch_steps 1000 \
 --epoch_count 34 \
 --epoch_begin 0 \
 --epoch_save 5 \
 --micro_bsz 2 \
 --n_layer 40 \
 --n_embd 5120 \
 --pre_ffn 0 \
 --head_qk 0 \
 --lr_init 1e-4 \
 --lr_final 5e-7 \
 --warmup_steps 0 \
 --beta1 0.9 \
 --beta2 0.999 \
 --adam_eps 1e-8 \
 --lora \
 --lora_r 32 \
 --lora_alpha 64 \
 --lora_dropout 0.05 \
 --lora_parts=att,ffn,time,ln \
 --accelerator gpu \
 --devices 1 \
 --precision bf16 \
 --grad_cp 0 \
 --strategy deepspeed_stage_2

Merge weights (since LoRA isn't supported in most implementations):

python merge_lora.py 64 RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth rwkv-45.pth RWKV-14B-WizardLM.pth

iwalton3
/

rwkv-14b-wizardlm

RWKV 14B WizardLM LoRA

Preparing Data

Training

Dataset used to train iwalton3/rwkv-14b-wizardlm

Space using iwalton3/rwkv-14b-wizardlm 1