IIC
/

RigoChat-7b-v2-GGUF

@@ -38,7 +38,7 @@ os.environ["MODEL_DIR"] = snapshot_download(
 2. To transform to `FP16`:
 ```shell
-python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
 ```
 Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
@@ -46,19 +46,19 @@ Nevertheless, you can download this weights [here](https://huggingface.co/IIC/Ri
 To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
 ```shell
-llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
 ```
 where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
 ```shell
-llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
 ```
 and so on. Yo can do:
 ```shell
-llama-quantize --help
 ```
 to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
@@ -73,13 +73,13 @@ The `train_data.txt` dataset is optional for most quantizations. We have used an
 You can do, for example
 ```shell
-llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
 ```
 or
 ```shell
-llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
 ```
 ## Evaluation

 2. To transform to `FP16`:
 ```shell
+python ./llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
 ```
 Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
 To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
 ```shell
+./llama.cpp/llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
 ```
 where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
 ```shell
+./llama.cpp/llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
 ```
 and so on. Yo can do:
 ```shell
+./llama.cpp/llama-quantize --help
 ```
 to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
 You can do, for example
 ```shell
+./llama.cpp/llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
 ```
 or
 ```shell
+./llama.cpp/llama-server -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
 ```
 ## Evaluation