TheBloke commited on
Commit
0ef6ee6
1 Parent(s): 4256aa6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -31,11 +31,17 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
31
 
32
  **This is an experimental new GPTQ which offers up to 8K context size**
33
 
34
- The increased context is currently only tested to work with [ExLlama](https://github.com/turboderp/exllama), via the latest release of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 
 
 
 
 
 
35
 
36
  Please read carefully below to see how to use it.
37
 
38
- **NOTE**: Using the full 8K context will exceed 24GB VRAM.
39
 
40
  GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
41
 
 
31
 
32
  **This is an experimental new GPTQ which offers up to 8K context size**
33
 
34
+ The increased context is tested to work with [ExLlama](https://github.com/turboderp/exllama), via the latest release of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
35
+
36
+ It has also been tested from Python code using AutoGPTQ, and `trust_remote_code=True`.
37
+
38
+ Code credits:
39
+ - Original concept and code for inreasing context length: [kaiokendev](https://huggingface.co/kaiokendev)
40
+ - Updated Llama modelling code that includes this automatically via trust_remote_code: [emozilla](https://huggingface.co/emozilla).
41
 
42
  Please read carefully below to see how to use it.
43
 
44
+ **NOTE**: Using the full 8K context on a 30B model will exceed 24GB VRAM.
45
 
46
  GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
47