arzeth
/

rho-math-7b-interpreter-v0.1.imatrix-GGUF

Inference Endpoints

Model card Files Files and versions Community

arzeth commited on Apr 12

Commit

4e45121

•

1 Parent(s): ce8ffd7

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -4,4 +4,12 @@ license: mit
 Author of the model: Microsoft
-Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1

 Author of the model: Microsoft
+Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1
+I created imatrix with
+```
+./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat
+```
+which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. `imatrix` consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM.
+Quantized with llama.cpp b2661 (2024-04-12), compiled with `LLAMA_CUDA_FORCE_MMQ=1` (full cmd: `make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS`) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).