Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,21 @@
|
|
1 |
Requantization of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
|
2 |
|
3 |
-
|
|
|
4 |
|
5 |
-
So, no Alpha or Rope Base Frequency up to its base 32k context, if it works as intended.
|
|
|
6 |
|
7 |
-
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
- miqu-1-70b.q2_K.gguf,-,Hellaswag,87.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
|
12 |
- miqu-1-70b.q2_K.gguf,-,Hellaswag,86.5,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
|
@@ -43,7 +52,7 @@ Benchs I made with the Q3_K_M I quantized from Miqudev's Q5_K_M with an intermed
|
|
43 |
- miqu-1-70b.Q3_K_M.gguf,-,wikitext,4.2957,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,81
|
44 |
- miqu-1-70b.Q3_K_M.gguf,-,wikitext,3.8380,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
|
45 |
|
46 |
-
And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, made in the same way :
|
47 |
|
48 |
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,89,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
49 |
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,88.3,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
@@ -58,4 +67,20 @@ And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, made in the same way
|
|
58 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
|
59 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
60 |
|
61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
Requantization of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
|
2 |
|
3 |
+
Miku 70b has a theta of 1,000,000, like CodeLlama, and not 10,000, like Llama 2 models usually have.
|
4 |
+
That feature singularizes it to my knowledge to ALL Llama 2 models, beside Codellamas.
|
5 |
|
6 |
+
So, no Alpha or Rope Base Frequency change is needed up to its base 32k context, if it works as intended.
|
7 |
+
And if it does, no linear/yarn rope is necessary either to reach the base 32k context.
|
8 |
|
9 |
+
But Miqu is NOT a CodeLlama 70b (released only a few days after Miqu 70b), because :
|
10 |
|
11 |
+
- If the Theta of CodeLlama 70b is claimed to be 1,000,000, its base rope actually seems to be 10,000 (see benchs..)
|
12 |
+
- Which means that CodeLlama might be context limited as Llama 2 is, instead of having a baseline of 100,000 ctx max..
|
13 |
+
- Meanwhile, Miku's perplexity is close to 70b Llama 2 (less than 4 at 512ctx), while CL 70b is around 5.5 at least.
|
14 |
+
- The benchs less sensitive to quantization (Hellaswag, Winogrande, but others as well) confirm this as well..
|
15 |
+
|
16 |
+
So, CodeLlama 70b is nerfed like the other CodeLlama in general benchmarks terms, while Miku is matching a FINETUNED Llama-2 expectations.
|
17 |
+
|
18 |
+
Benchs I made with the original Q2_K quant of Miku 70b, made from the FP16 and published by Miqudev :
|
19 |
|
20 |
- miqu-1-70b.q2_K.gguf,-,Hellaswag,87.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
|
21 |
- miqu-1-70b.q2_K.gguf,-,Hellaswag,86.5,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
|
|
|
52 |
- miqu-1-70b.Q3_K_M.gguf,-,wikitext,4.2957,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,81
|
53 |
- miqu-1-70b.Q3_K_M.gguf,-,wikitext,3.8380,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
|
54 |
|
55 |
+
And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, that I made in the same way :
|
56 |
|
57 |
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,89,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
58 |
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,88.3,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
|
|
67 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
|
68 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
69 |
|
70 |
+
Meanwhile, CodeLlama 70b Q2_K benches as such, to compare with Miqu 70B Q2_K originally quantized from FP16 by Miqudev :
|
71 |
+
|
72 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.5,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
73 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.2,,1000,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
74 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag_Bin,69.75,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
75 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag_Bin,72.5,,1000,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
76 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Arc-Challenge,35.11705686,,299,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
77 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Arc-Easy,58.77192982,,570,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
78 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,MMLU,36.10223642,,313,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
79 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Thruthful-QA,31.08935129,,817,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
80 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Winogrande,70.3236,,1267,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
81 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,,512,512,2024-01-30 01:40:00,RBF1000000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,655
|
82 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,9.7866,512,512,2024-01-30 01:40:00,RBF1000000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
83 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,8.5822,512,512,2024-01-30 01:40:00,RBF500000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
84 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,7.1098,512,512,2024-01-30 01:40:00,RBF100000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
85 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.8224,512,512,2024-01-30 01:40:00,RBF50000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
86 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.5705,512,512,2024-01-30 01:40:00,RBF10000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|