bhenrym14
/

airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ

Text Generation

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 3, 2023

Commit

bf9a7aa

·

1 Parent(s): 444fb74

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -4,6 +4,7 @@
 This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (with GPTQ Quantization) with several key modifications:
 - Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA.
 - Training sequences beyond 2048 have the target truncated to equal 2048.
 Otherwise, I emulated the training process as closely as possible. It was trained on 1x RTX 6000 Ada for ~43 hours.
@@ -18,7 +19,8 @@ Recent advancements in extending context by RoPE scaling ([kaiokendev](https://k
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **2048**    | **4.32**   |
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **3072**    | **4.26**   |
-How does this reduction in perplexity translate into actual performance lift on downstream tasks? I'm not sure yet.
 ## Quantization:

 This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (with GPTQ Quantization) with several key modifications:
 - Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA.
 - Training sequences beyond 2048 have the target truncated to equal 2048.
+- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
 Otherwise, I emulated the training process as closely as possible. It was trained on 1x RTX 6000 Ada for ~43 hours.
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **2048**    | **4.32**   |
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **3072**    | **4.26**   |
+- How does this reduction in perplexity translate into actual performance lift on downstream tasks? I'm not sure yet.
+- This comparison isn't perfect. I did use the 1.4.1 dataset and the quantization method is slightly different.
 ## Quantization: