Difference between this and the non 8K context
So, immediately after Ooba's update dropped I tried to use the old Nous-Hermes-13B-GPTQ model and managed to get to 4096 context sizes without getting OOM'ed from loading it and using it, this was tested with 3060 12GB VRAM.
One thing I did notice from the non-superHOT was that after reaching a large enough context size, around 3200~ context,(FYI, max_new_token was set to 800), the model tends to repeats the previous responses specially if you try to reference any elements in its context memory or what the user and AI has said.
context:
User: Create a story about a character named Chad.
AI: Chad is a good boy (...) and he do good things.
Once context reaches 3000+, when you try to prompt the model with the same pattern from before or mentioning a specific character in its context, it tends to repeat the response it already gave.
User: Can you tell me about Sam?
AI: Sam is a good boy (...) and he do good things. (Repeated a response it already given but changing the subject)
or
User: Continue the story about Chad becoming bad.
AI: Chad is a good boy (...) and he do good things. (Completely repeated the response it already gave)
However, I do have to admit that with the update it actually does remember what's in its 3000+ context. Specially if you try to break out of the pattern of your prompts.
User: Summarize the story of Chad.
AI: Chad is a good boy and he met Sam and he became a bad boy.
I wonder how does the SuperHOT version differs. I will update this post later after I finished downloading this SuperHOT version and find time to test it.