Correct max_position_embeddings to 32768

FYI 32764 is what the value is in the GGUFs llm_load_print_meta: n_ctx_train = 32764

Hmmm, I see that now too. Wonder why that would be the case, instead of just going with 2^16 like mistral 7b instruct v0.2 or mixtral.

Owner

i don't know either. I doubt it would do significant damage to bump the max to 32768. Having an odd context length like 2^15-4 is also plausibly significantly harmful if you do static kvcache preallocation, due to uneven shapes.

But the LLM community can discover that themselves, if it is true. This checkpoint is meant to be as close to the leak as possible.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment