What is the context length?
#2
by
softwareweaver
- opened
Is this model the original 8K context? or is it larger?
Thanks,
Ash
If it works like all Llama3 models I tried out, you can push it further just by maxing rope_tetha in cfg file or cli/ui, without using any NTK nor compress_pos solutions.
For 8B-instruct it can get a key in 50k context at most with "rope_tetha": 8000000.0
Still figuring why, but that kinda work out of the box without further training (need to eval the possible drawbacks but they feel almost non-existent).
softwareweaver
changed discussion status to
closed