|
--- |
|
license: llama2 |
|
--- |
|
Quants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2 |
|
|
|
With a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well) |
|
|
|
I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF |
|
|
|
So I made a Q8_0 out of it (best way to requantize after), and requantized it in Q3_K_S and Q2_K for my needs. |
|
|
|
Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM). |
|
|
|
And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933 |
|
|
|
----- |
|
|
|
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens). |
|
And good news, it lowers the perplexity by : |
|
|
|
More than 3% in Rope 8 on Q2_K |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512 |
|
|
|
More than 2% in Rope 4 on Q2_K |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.8859,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512 |
|
|
|
More than 1.5% in Rope 2 on Q2_K |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5030,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512 |
|
|
|
More than 1% with Rope 8 on Q3_K_S |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512 |
|
|
|
A Q3_K_M with iMatrix has been added as well. |
|
|
|
----- |
|
|
|
Interestingly, Rope 2.5 is almost without loss compared to rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K : |
|
|
|
Rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512 |
|
|
|
Rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6203,512 |
|
|
|
Rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512 |
|
|
|
And for the adventurous, Rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512 |
|
- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512 |
|
|
|
So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context. |
|
|
|
Then, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat). |
|
Multiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277). |
|
This to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392. |
|
It's only theorical of course, but worth testing. |
|
|
|
----- |
|
|
|
Benchs of the original Q4_K_S quant I found : |
|
|
|
Rope 8 10000 |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400 |
|
|
|
Rope 4 10000 |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400 |
|
|
|
Rope 2 10000 |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512 |
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400 |
|
|
|
Rope 1 10000 |
|
|
|
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400 |