IIC
/

File size: 3,318 Bytes
6986e18
 
 
 
 
 
 
08e99f3
4dce5c2
6986e18
 
 
 
 
 
 
 
 
c0bdc36
702726a
0e19e06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98371b3
0e19e06
 
ac27130
0e19e06
ac27130
 
 
98371b3
ac27130
 
 
 
 
98371b3
ac27130
 
 
 
 
98371b3
ac27130
 
 
 
 
 
 
 
 
 
 
 
 
 
98371b3
ac27130
 
 
 
 
98371b3
ac27130
6986e18
 
 
 
 
 
606f832
6986e18
606f832
 
 
 
6986e18
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
library_name: transformers
language:
- es
base_model:
- IIC/RigoChat-7b-v2
pipeline_tag: text-generation
license: other
license_name: rigochat-nc
license_link: https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/LICENSE
tags:
- chat
---

# Model Card for RigoChat-7b-v2-GGUF

## Introduction

This repo contains [IIC/RigoChat-7b-v2](https://huggingface.co/IIC/RigoChat-7b-v2) model in the GGUF Format, with the original weights and quantized to different precisions.

The [llama.cpp](https://github.com/ggerganov/llama.cpp) library has been used to transform the parameters into GGUF format, as well as to perform the quantizations. Specifically, the following command has been used to obtain the model in full precision:

1. To download the weights:

```python
from huggingface_hub import snapshot_download
import os

model_id="IIC/RigoChat-7b-v2"

os.environ["MODEL_DIR"] = snapshot_download(
    repo_id=model_id,
    local_dir="model",
    local_dir_use_symlinks=False,
    revision="main",
)
```

2. To transform to `FP16`:

```shell
python ./llama.cpp/convert_hf_to_gguf.py $MODEL_DIR --outfile rigochat-7b-v2-F16.gguf --outtype f16
```

Nevertheless, you can download this weights [here](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF/blob/main/rigochat-7b-v2-F16.gguf).

To quantize `rigochat-7b-v2-F16.gguf` into diferent sizes, first, we calculates an importance matrix as follows:

```shell
./llama.cpp/llama-imatrix -m ./rigochat-7b-v2-fp16.gguf -f train_data.txt -c 1024
```

where `train_data.txt` is an spanish raw-text dataset for calibration. This generates an `imatrix.dat` file that we can use to quantize the original model. For example, to get the `Q4_K_M` precision with this config, do:

```shell
./llama.cpp/llama-quantize --imatrix imatrix.dat ./rigochat-7b-v2-fp16.gguf ./quantize_models/rigochat-7b-v2-Q4_K_M.gguf Q4_K_M
```

and so on. Yo can do:

```shell
./llama.cpp/llama-quantize --help
```

to see all the quantization options. To check how imatrix works, [this example](https://github.com/ggerganov/llama.cpp/blob/master/examples/imatrix/README.md) can be usefull. For more information on the quantization types, see [this link](https://huggingface.co/docs/hub/gguf#quantization-types).

#### Disclaimer

The `train_data.txt` dataset is optional for most quantizations. We have used an experimental dataset to obtain all possible quantizations. However, we highly recommend downloading the weights in full precision: `rigochat-7b-v2-fp16.gguf` and trying to quantize the model with your own datasets, adapted to the use case you want to use.


## How to Get Started with the Model

You can do, for example

```shell
./llama.cpp/llama-cli -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
```

or

```shell
./llama.cpp/llama-server -m ./rigochat-7b-v2-Q8_0.gguf -co -cnv -p "Your system." -fa -ngl -1 -n 512
```

## Evaluation

## Citation

```
@misc {instituto_de_ingeniería_del_conocimiento_2025,
	author       = { {Instituto de Ingeniería del Conocimiento} },
	title        = { RigoChat-7b-v2-GGUF },
	year         = 2025,
	url          = { https://huggingface.co/IIC/RigoChat-7b-v2-GGUF },
	doi          = { 10.57967/hf/4159 },
	publisher    = { Hugging Face }
}
```

## Model Card Contact

- [contacto.iic@iic.uam.es](contacto.iic@iic.uam.es).