Mamba GGUF
These are the Mamba base models, converted to GGUF for use with llama.cpp, in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit).
Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "📦LFS
↓
" button next to your desired quantization.
Here is a table adapted from TheBloke explaining the various precisions:
Quant method | Use case |
---|---|
Q2_K | significant quality loss - not recommended for most purposes |
Q3_K_S | very small, high quality loss |
Q3_K_M | very small, high quality loss |
Q3_K_L | small, substantial quality loss |
Q4_0 | legacy; small, very high quality loss - prefer using Q3_K_M |
Q4_K_S | small, greater quality loss |
Q4_K_M | medium, balanced quality - recommended |
Q5_0 | legacy; medium, balanced quality - prefer using Q4_K_M |
Q5_K_S | large, low quality loss - recommended |
Q5_K_M | large, very low quality loss - recommended |
Q6_K | very large, extremely low quality loss |
Q8_0 | very large, extremely low quality loss - not recommended |
F16 | half precision - almost identical to the original |
F32 | original precision - recommended by the Mamba authors |
- Downloads last month
- 847