Benchmarks!

#2
by ChuckMcSneed - opened

If anyone else has done benchmarks of this model and normal lzlv, please post them, it's interesting to see how much the performance really degrades.

image.png
I did run my meme benchmark on it, and it seems that the performance is degraded only by ~30%(SP column).

Nice, thanks for testing.

More benchmarks !

lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,Hellaswag,83.5,400,,2024-02-06 00:00:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,Hellaswag,82.3,1000,,2024-02-06 00:00:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,Arc-Challenge,49.83277592,,299,2024-02-06 05:40:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,Arc-Easy,70,570,2024-02-06 05:40:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,MMLU,44.72843450,,313,2024-02-06 05:40:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,Thruthful-QA,34.88372093,,817,2024-02-06 05:40:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,Winogrande,77.9795,,1267,2024-02-06 05:40:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,52.7892,512,512,2024-02-06 00:00:00,PEC2,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,81
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,6.5927,512,512,2024-02-06 00:00:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,81
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,5.7020,512,512,2024-02-06 00:00:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,655
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,4.5223,512,512,2024-02-06 00:00:00,PEC8,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,81
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,4.1258,512,512,2024-02-06 00:00:00,PEC8,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,655
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,4.3087,512,512,2024-02-06 00:00:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,81
lzlv-longLORA-70b-rope8-32k-Q3_K_M.gguf,-,wikitext,3.9153,512,512,2024-02-06 00:00:00,PEC4,70b,Llama_2,4096,,,GGUF,Grimulkan,NanoByte,655

Linear rope 4 wins over 8, for those who need 16k context only!

Interesting, I'd have thought the 1 billion tokens of training at rope 8 (in the adapter) would have made it preferable to rope 4, even at 16K context. But I guess not everything translates right when you merge the adapter into a different model.

Sign up or log in to comment