Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,12 @@ license: mit
|
|
4 |
|
5 |
Author of the model: Microsoft
|
6 |
|
7 |
-
Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
Author of the model: Microsoft
|
6 |
|
7 |
+
Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1
|
8 |
+
|
9 |
+
I created imatrix with
|
10 |
+
```
|
11 |
+
./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat
|
12 |
+
```
|
13 |
+
which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. `imatrix` consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM.
|
14 |
+
|
15 |
+
Quantized with llama.cpp b2661 (2024-04-12), compiled with `LLAMA_CUDA_FORCE_MMQ=1` (full cmd: `make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS`) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).
|