arzeth commited on
Commit
4e45121
1 Parent(s): ce8ffd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -4,4 +4,12 @@ license: mit
4
 
5
  Author of the model: Microsoft
6
 
7
- Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1
 
 
 
 
 
 
 
 
 
4
 
5
  Author of the model: Microsoft
6
 
7
+ Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1
8
+
9
+ I created imatrix with
10
+ ```
11
+ ./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat
12
+ ```
13
+ which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. `imatrix` consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM.
14
+
15
+ Quantized with llama.cpp b2661 (2024-04-12), compiled with `LLAMA_CUDA_FORCE_MMQ=1` (full cmd: `make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS`) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).