arzeth
/

rho-math-7b-interpreter-v0.1.imatrix-GGUF

Inference Endpoints

Model card Files Files and versions Community

arzeth commited on Apr 12

Commit

a4e1cdb

•

1 Parent(s): 73a7002

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ with settings `{
   temperature: 0.8
 }`
-outputs:
 <pre>
 ```python
@@ -51,6 +51,8 @@ print(result)
 The area of the circle is $\boxed{\frac{27\pi}{4}}$ square cm.
 </pre>
 According to their [paper on arXiv](https://arxiv.org/abs/2404.07965), rho-math-7b-v0.1 is a continued pretraining on Mistral-7B, while their 1B model is a continued pretaining on TinyLlama-1.1B.
 # imatrix
@@ -63,6 +65,4 @@ which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread
 # quantize
-Quantized with llama.cpp b2661 (2024-04-12), compiled with `LLAMA_CUDA_FORCE_MMQ=1` (full cmd: `make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS`) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).
-IQ3_XS (3 018 815 264 bytes) is stupid, it thinks radius=diameter, so I didn't upload it or lower quants.

   temperature: 0.8
 }`
+outputs (using unquantized gguf):
 <pre>
 ```python
 The area of the circle is $\boxed{\frac{27\pi}{4}}$ square cm.
 </pre>
+??? It should have been `9*pi/4`. Am I using this model wrong? Same result with temperature=0.0,top_k=1.
 According to their [paper on arXiv](https://arxiv.org/abs/2404.07965), rho-math-7b-v0.1 is a continued pretraining on Mistral-7B, while their 1B model is a continued pretaining on TinyLlama-1.1B.
 # imatrix
 # quantize
+Quantized with llama.cpp b2661 (2024-04-12), compiled with `LLAMA_CUDA_FORCE_MMQ=1` (full cmd: `make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS`) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).