What is the context length?

by softwareweaver - opened May 2, 2024

Discussion

softwareweaver

May 2, 2024

Is this model the original 8K context? or is it larger?

Thanks,
Ash

nathangonzalez

May 2, 2024

mammour

May 2, 2024

If it works like all Llama3 models I tried out, you can push it further just by maxing rope_tetha in cfg file or cli/ui, without using any NTK nor compress_pos solutions.
For 8B-instruct it can get a key in 50k context at most with "rope_tetha": 8000000.0
Still figuring why, but that kinda work out of the box without further training (need to eval the possible drawbacks but they feel almost non-existent).

softwareweaver

May 3, 2024

Thanks @mammour and @nathangonzalez

softwareweaver changed discussion status to closed May 3, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment