LLaMA 33b finetuned on wikitext_document_level with combined linear and NTK-aware ROPE scaling (alpha=4, scale=2.) This model will be coherent up to at least 8k context length, but might work beyond that. This is a merged version of llama33b-s2a4-qlora.

Note that this is not an instruct model - this is base LLaMA with an extended sequence length.

Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train chargoddard/llama33b-s2a4