Melvin56 commited on
Commit
a3a47b8
·
verified ·
1 Parent(s): 347f5a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -18,6 +18,23 @@ Original Model : [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/a
18
 
19
  All quants are made using the imatrix option.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  | Model | Size (GB) |
23
  |:-------------------------------------------------|:-------------:|
 
18
 
19
  All quants are made using the imatrix option.
20
 
21
+ | | CPU (AVX2) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute |
22
+ | :------------ | :---------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: |
23
+ | K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢5 | ✅ 🐢5 | ❌ |
24
+ | I-quants | ✅ 🐢4 | ✅ 🐢4 | ✅ | ✅ | Partial¹ | ❌ | ❌ | ❌ |
25
+
26
+ ```
27
+ ✅: feature works.
28
+ 🚫: feature does not work
29
+ ❓: unknown, please contribute if you can test it youself
30
+ 🐢: feature is slow
31
+ ¹: IQ3_S and IQ1_S, see #5886
32
+ ²: Only with -ngl 0
33
+ ³: Inference is 50% slower
34
+ ⁴: Slower than K-quants of comparable size
35
+ ⁵: Slower than cuBLAS/rocBLAS on similar cards
36
+ ⁶: Only q8_0 and iq4_nl
37
+ ```
38
 
39
  | Model | Size (GB) |
40
  |:-------------------------------------------------|:-------------:|