How about a quantized version that fits in 16 GB of memory like wizardlm?
3
#19 opened 7 months ago
by
Zibri
Will you redo quants after your bpe pr gets merged?
2
#18 opened 8 months ago
by
ggnoy
I'm generating a imatrix using `groups_merged.txt` if you want me to run any tests?
19
#15 opened 8 months ago
by
jukofyork
Can we get a Q4 without the IMat?
2
#14 opened 8 months ago
by
yehiaserag
fail on 104b-iq2_xxs.gguf with llama.cpp
4
#12 opened 8 months ago
by
telehan
Invalid split files?
3
#11 opened 9 months ago
by
SabinStargem
Unable to load in ollama built from PR branch
3
#10 opened 9 months ago
by
gigq
Is IQ1_S broken? If so why list it here?
1
#9 opened 9 months ago
by
stduhpf
Fast work by the people on the llama.cpp team
3
#8 opened 9 months ago
by
qaraleza
For a context of at least 32K tokens which version on a 2x16GB Gpu Config?
1
#3 opened 9 months ago
by
Kalemnor
What does iMat mean?
15
#2 opened 9 months ago
by
AS1200