LLaMA 33b finetuned on wikitext_document_level
with combined linear and NTK-aware ROPE scaling (alpha=4, scale=2.)
This model will be coherent up to at least 8k context length, but might work beyond that.
This is a merged version of llama33b-s2a4-qlora.
Note that this is not an instruct model - this is base LLaMA with an extended sequence length.
- Downloads last month
- 15
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.