Context Length?
Noticed you removed the context length piece from the README, any reason?
Yeah, I just added a note to the readme. I was getting feedback that the larger context sizes didn't work, but haven't had a chance to test it until now, but in the meantime I removed that note until I could test.
To train, I updated the code to use a context size of 4096, and made sure to include training data exceeding 2048, and event tested a few prompts over 2048 which produced coherent output, but unfortunately I did not do enough due diligence in testing the larger ranges. My sincerest apologies for misleading you (and anyone else)!
It seems that the whole foundation model would have to be overhauled to properly achieve this. I started testing an alternative, using landmark attention (landmark tokens for each context section), but it'll be a while before that is ready (if it even proves successful).
You might want to look into implementing xPos for increased context length.
https://github.com/kaiokendev/cutoff-len-is-context-len