arzeth's picture
Update README.md
4e45121 verified
|
raw
history blame
854 Bytes
metadata
license: mit

Author of the model: Microsoft

Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1

I created imatrix with

./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat

which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. imatrix consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM.

Quantized with llama.cpp b2661 (2024-04-12), compiled with LLAMA_CUDA_FORCE_MMQ=1 (full cmd: make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).