Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ os.environ["MODEL_DIR"] = snapshot_download(
|
|
38 |
2. To transform to `FP16`:
|
39 |
|
40 |
```shell
|
41 |
-
python
|
42 |
```
|
43 |
|
44 |
Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
|
@@ -46,19 +46,19 @@ Nevertheless, you can download this weights [here](https://huggingface.co/IIC/Ri
|
|
46 |
To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
|
47 |
|
48 |
```shell
|
49 |
-
llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
|
50 |
```
|
51 |
|
52 |
where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
|
53 |
|
54 |
```shell
|
55 |
-
llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
|
56 |
```
|
57 |
|
58 |
and so on. Yo can do:
|
59 |
|
60 |
```shell
|
61 |
-
llama-quantize --help
|
62 |
```
|
63 |
|
64 |
to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
|
@@ -73,13 +73,13 @@ The `train_data.txt` dataset is optional for most quantizations. We have used an
|
|
73 |
You can do, for example
|
74 |
|
75 |
```shell
|
76 |
-
llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
|
77 |
```
|
78 |
|
79 |
or
|
80 |
|
81 |
```shell
|
82 |
-
llama-
|
83 |
```
|
84 |
|
85 |
## Evaluation
|
|
|
38 |
2. To transform to `FP16`:
|
39 |
|
40 |
```shell
|
41 |
+
python ./llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
|
42 |
```
|
43 |
|
44 |
Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).
|
|
|
46 |
To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:
|
47 |
|
48 |
```shell
|
49 |
+
./llama.cpp/llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
|
50 |
```
|
51 |
|
52 |
where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:
|
53 |
|
54 |
```shell
|
55 |
+
./llama.cpp/llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
|
56 |
```
|
57 |
|
58 |
and so on. Yo can do:
|
59 |
|
60 |
```shell
|
61 |
+
./llama.cpp/llama-quantize --help
|
62 |
```
|
63 |
|
64 |
to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).
|
|
|
73 |
You can do, for example
|
74 |
|
75 |
```shell
|
76 |
+
./llama.cpp/llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
|
77 |
```
|
78 |
|
79 |
or
|
80 |
|
81 |
```shell
|
82 |
+
./llama.cpp/llama-server -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
|
83 |
```
|
84 |
|
85 |
## Evaluation
|