File size: 807 Bytes
9872d58
 
 
 
 
 
 
6f7e17b
 
 
 
 
9872d58
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
---
license: apache-2.0
---

* Quantization of Qwen2.5 14B for edge devices 7.3Gb footprint
* One of the best models I tried in Spanish.
* Original model: https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5
* Models Merged:
    * huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
    * allura-org/TQ2.5-14B-Aletheia-v1
    * EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
    * v000000/Qwen2.5-Lumen-14B

* All quants made using imatrix option with dataset from here
* Using llama.cpp compiled with CUDA support for quantization and inference:

`
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 3982 (cc2983d3)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
`