Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,7 @@
|
|
4 |
This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (with GPTQ Quantization) with several key modifications:
|
5 |
- Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA.
|
6 |
- Training sequences beyond 2048 have the target truncated to equal 2048.
|
|
|
7 |
|
8 |
Otherwise, I emulated the training process as closely as possible. It was trained on 1x RTX 6000 Ada for ~43 hours.
|
9 |
|
@@ -18,7 +19,8 @@ Recent advancements in extending context by RoPE scaling ([kaiokendev](https://k
|
|
18 |
| **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ** | **2048** | **4.32** |
|
19 |
| **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ** | **3072** | **4.26** |
|
20 |
|
21 |
-
How does this reduction in perplexity translate into actual performance lift on downstream tasks? I'm not sure yet.
|
|
|
22 |
|
23 |
## Quantization:
|
24 |
|
|
|
4 |
This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (with GPTQ Quantization) with several key modifications:
|
5 |
- Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA.
|
6 |
- Training sequences beyond 2048 have the target truncated to equal 2048.
|
7 |
+
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
8 |
|
9 |
Otherwise, I emulated the training process as closely as possible. It was trained on 1x RTX 6000 Ada for ~43 hours.
|
10 |
|
|
|
19 |
| **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ** | **2048** | **4.32** |
|
20 |
| **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ** | **3072** | **4.26** |
|
21 |
|
22 |
+
- How does this reduction in perplexity translate into actual performance lift on downstream tasks? I'm not sure yet.
|
23 |
+
- This comparison isn't perfect. I did use the 1.4.1 dataset and the quantization method is slightly different.
|
24 |
|
25 |
## Quantization:
|
26 |
|