Mostly untested!

RoPE Scaled QLoRA Long Context Extension of Llama-33b (LoRA)

Overview

This is base Llama-33b with minimal additional training to extend the useful context window.

Context length extended to 16384 by RoPE Scaled Embeddings (Position Interpolation).
Pretrained for additional 100 steps on 8192 length sequences from the pile dataset.
The merged model is used as the starting point for training bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA

This is a QLoRA fine-tune

Pretraining took 10 hours on 1x RTX 6000 Ada.