File size: 1,403 Bytes
5fe68f7
 
 
198b278
 
 
db91c10
198b278
 
27a8de1
ca2e92e
 
27a8de1
b9ae2ab
 
 
 
 
27a8de1
 
198b278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: mit
---

### SuperHOT Prototype 2 w/ 8K Context

This is a second prototype of SuperHOT, a NSFW focused LoRA, this time 30B with 8K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k).
Tests have shown that the model does indeed leverage the extended context at 8K.

#### Looking for Merged & Quantized Models?
- 30B 4-bit CUDA: [tmpupload/superhot-30b-8k-4bit-safetensors](https://huggingface.co/tmpupload/superhot-30b-8k-4bit-safetensors)
- 30B 4-bit CUDA 128g: [tmpupload/superhot-30b-8k-4bit-128g-safetensors](https://huggingface.co/tmpupload/superhot-30b-8k-4bit-128g-safetensors)

#### Using the monkey-patch?
You will **NEED** to **apply the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.25 and the maximum sequence length to 8192**

#### Using Oobabooga with Exllama?
- `python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf`

#### Training Details
I trained the LoRA with the following configuration: 
- 1200 samples (~400 samples over 2048 sequence length)
- learning rate of 3e-4 
- 3 epochs
- The exported modules are:
    - q_proj
    - k_proj
    - v_proj
    - o_proj
    - no bias
- Rank = 4
- Alpha = 8
- no dropout
- weight decay of 0.1
- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
- Trained on 4-bit base model