Upload README.md
Browse files
README.md
CHANGED
@@ -44,7 +44,7 @@ Perplexity for f16 gguf is 6.646565 ± 0.040986.
|
|
44 |
| [Q6_K](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q6_K.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 42.26GB | -0.002436 ± 0.001565 | 0.003332 ± 0.000014 | Good for Nvidia cards or Apple Silicon with 48GB RAM. Should perform very close to the original |
|
45 |
| [Q5_K_M](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q5_K_M.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 36.47GB | 0.020310 ± 0.002052 | 0.005642 ± 0.000024 | Good for A100 40GB or dual 3090. Better than Q4_K_M but larger and slower. |
|
46 |
| [Q4_K_M](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_K_M.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 31.04GB | 0.055444 ± 0.002982 | 0.012021 ± 0.000052 | Good for A100 40GB or dual 3090. Higher cost performance ratio than Q5_K_M. |
|
47 |
-
| IQ4_NL | calibration_datav3 | 29.30GB | 0.088279 ± 0.003944 | 0.020314 ± 0.000093 | For 32GB cards, e.g. 5090.
|
48 |
| [IQ4_XS](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.IQ4_XS.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 27.74GB | 0.095486 ± 0.004039 | 0.020962 ± 0.000097 | For 32GB cards, e.g. 5090. Too slow for CPU and Apple. Recommended. |
|
49 |
| Q4_0 | calibration_datav3 | 29.34GB | 0.543042 ± 0.009290 | 0.077602 ± 0.000389 | For 32GB cards, e.g. 5090. Too slow for CPU and Apple. |
|
50 |
| [Q4_0_4_8](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_0_4_8.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 29.25GB | Same as Q4_0 assumed | Same as Q4_0 assumed | For Apple Silicon |
|
|
|
44 |
| [Q6_K](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q6_K.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 42.26GB | -0.002436 ± 0.001565 | 0.003332 ± 0.000014 | Good for Nvidia cards or Apple Silicon with 48GB RAM. Should perform very close to the original |
|
45 |
| [Q5_K_M](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q5_K_M.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 36.47GB | 0.020310 ± 0.002052 | 0.005642 ± 0.000024 | Good for A100 40GB or dual 3090. Better than Q4_K_M but larger and slower. |
|
46 |
| [Q4_K_M](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_K_M.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 31.04GB | 0.055444 ± 0.002982 | 0.012021 ± 0.000052 | Good for A100 40GB or dual 3090. Higher cost performance ratio than Q5_K_M. |
|
47 |
+
| IQ4_NL | calibration_datav3 | 29.30GB | 0.088279 ± 0.003944 | 0.020314 ± 0.000093 | For 32GB cards, e.g. 5090. Minor performance gain doesn't justify its use over IQ4_XS |
|
48 |
| [IQ4_XS](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.IQ4_XS.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 27.74GB | 0.095486 ± 0.004039 | 0.020962 ± 0.000097 | For 32GB cards, e.g. 5090. Too slow for CPU and Apple. Recommended. |
|
49 |
| Q4_0 | calibration_datav3 | 29.34GB | 0.543042 ± 0.009290 | 0.077602 ± 0.000389 | For 32GB cards, e.g. 5090. Too slow for CPU and Apple. |
|
50 |
| [Q4_0_4_8](https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_0_4_8.gguf) | [calibration_datav3](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | 29.25GB | Same as Q4_0 assumed | Same as Q4_0 assumed | For Apple Silicon |
|