IIC
/

gonzalo-santamaria-iic commited on
Commit
98371b3
verified
1 Parent(s): ac27130

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -38,7 +38,7 @@ os.environ["MODEL_DIR"] = snapshot_download(
38
  2. To transform to `FP16`:
39
 
40
  ```shell
41
- python ../llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
42
  ```
43
 
44
  Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
@@ -46,19 +46,19 @@ Nevertheless, you can download this weights [here](https://huggingface.co/IIC/Ri
46
  To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
47
 
48
  ```shell
49
- llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
50
  ```
51
 
52
  where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
53
 
54
  ```shell
55
- llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
56
  ```
57
 
58
  and so on. Yo can do:
59
 
60
  ```shell
61
- llama-quantize --help
62
  ```
63
 
64
  to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
@@ -73,13 +73,13 @@ The `train_data.txt` dataset is optional for most quantizations. We have used an
73
  You can do, for example
74
 
75
  ```shell
76
- llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
77
  ```
78
 
79
  or
80
 
81
  ```shell
82
- llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
83
  ```
84
 
85
  ## Evaluation
 
38
  2. To transform to `FP16`:
39
 
40
  ```shell
41
+ python ./llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
42
  ```
43
 
44
  Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
 
46
  To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
47
 
48
  ```shell
49
+ ./llama.cpp/llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
50
  ```
51
 
52
  where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
53
 
54
  ```shell
55
+ ./llama.cpp/llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
56
  ```
57
 
58
  and so on. Yo can do:
59
 
60
  ```shell
61
+ ./llama.cpp/llama-quantize --help
62
  ```
63
 
64
  to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
 
73
  You can do, for example
74
 
75
  ```shell
76
+ ./llama.cpp/llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
77
  ```
78
 
79
  or
80
 
81
  ```shell
82
+ ./llama.cpp/llama-server -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
83
  ```
84
 
85
  ## Evaluation