I have a question about increased context
I'm a noob and not sure what to do.
Sometimes I see models that have a context of more than 4 thousand, like standard LLaMa models. I am loading GGUF models via text gen web ui. What settings do I need to set for the model to work with increased context? I want the model to better analyze what is happening in the context of the role-playing game and not forget what happened 2-3 steps ago, as usually happens with a 120B goliath or venus if the role-playing game has more than 3 thousand tokens.
I'm not sure how correctly I described what I want. I use Google translate.
There isn't anything special you need to do with this model, at least up to 32K context. Just load it up and use it. It will be coherent out to 32K context without having to touch ROPE scaling settings.