flashvenom commited on
Commit
f988a72
1 Parent(s): bfa2180

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -1,5 +1,6 @@
1
  Model upload in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
2
 
3
  You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 
4
 
5
  Patch file present in repo or can be accessed here: https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test/raw/main/llama_rope_scaled_monkey_patch.py
 
1
  Model upload in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
2
 
3
  You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
4
+ ### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`
5
 
6
  Patch file present in repo or can be accessed here: https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test/raw/main/llama_rope_scaled_monkey_patch.py