Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,23 @@ Original Model : [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/a
|
|
18 |
|
19 |
All quants are made using the imatrix option.
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
| Model | Size (GB) |
|
23 |
|:-------------------------------------------------|:-------------:|
|
|
|
18 |
|
19 |
All quants are made using the imatrix option.
|
20 |
|
21 |
+
| | CPU (AVX2) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute |
|
22 |
+
| :------------ | :---------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: |
|
23 |
+
| K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢5 | ✅ 🐢5 | ❌ |
|
24 |
+
| I-quants | ✅ 🐢4 | ✅ 🐢4 | ✅ | ✅ | Partial¹ | ❌ | ❌ | ❌ |
|
25 |
+
|
26 |
+
```
|
27 |
+
✅: feature works.
|
28 |
+
🚫: feature does not work
|
29 |
+
❓: unknown, please contribute if you can test it youself
|
30 |
+
🐢: feature is slow
|
31 |
+
¹: IQ3_S and IQ1_S, see #5886
|
32 |
+
²: Only with -ngl 0
|
33 |
+
³: Inference is 50% slower
|
34 |
+
⁴: Slower than K-quants of comparable size
|
35 |
+
⁵: Slower than cuBLAS/rocBLAS on similar cards
|
36 |
+
⁶: Only q8_0 and iq4_nl
|
37 |
+
```
|
38 |
|
39 |
| Model | Size (GB) |
|
40 |
|:-------------------------------------------------|:-------------:|
|