LLaMA 33b finetuned on wikitext_document_level
with combined linear and NTK-aware ROPE scaling (alpha=4, scale=2.)
This model will be coherent up to at least 8k context length, but might work beyond that.
This is a merged version of llama33b-s2a4-qlora.
Note that this is not an instruct model - this is base LLaMA with an extended sequence length.
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.