Awesome! 88K context on 24GB works.
#3
by
Downtown-Case
- opened
However you finetuned this model, I just want to praise it for preserving the long context performance. I can run a 4.1bpw quant at 88K, with no offloading on 24GB VRAM... and it's as coherent as anything I've tried. It seems to work even better than the base model for raw continuation.
And it's infinitely better than Qwen 32B Instruct with YaRN, which is terrible at 88K. Please keep finetuning on the base model, if you update it!
And the language is great! Everything is great.
Wonderful work, thanks.