grimulkan/aurelian-v0.5-70b-rope8-32K-fp16

Nope, I think that's only the models original length (4096 for Llama-2).

There is a line in config.json that pertains to linear ROPE scaling ("rope_scaling": {"factor": 8.0, "type": "linear"} which is normally missing or None), but not all clients pay attention to it, and have their own GUI or command line argument to override it. Each client has its own name for it as well (egs., Ooba calls it compress_pos_emb). This is what sets the actual final context usually. It's a mess.

Rope theta scaling (or adjusted base frequency) on the other hand is specified in config.json and read automatically by most clients that read Huggingface format (i.e., not ggml), but I didn't use that method for this model.

grimulkan
/

aurelian-v0.5-70b-rope8-32K-fp16

max_position_embedding