dranger003
/

c4ai-command-r-plus-iMat.GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

dranger003 commited on Apr 8, 2024

Commit

85e40cc

·

verified ·

1 Parent(s): efe1ee6

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -11,7 +11,8 @@ The PR has been approved, we should expect it to be merged shortly into the main
 * The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using [wiki.train.raw](https://huggingface.co/datasets/wikitext).
 * [Which GGUF is right for me? (from Artefact2)](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) - X axis is file size and Y axis is perplexity (lower perplexity is better quality). Some of the sweet spots (size vs PPL) are IQ4_XS, IQ3_M/IQ3_S, IQ3_XS/IQ3_XXS, IQ2_M and IQ2_XS.
 * The [imatrix is being used on the K-quants](https://github.com/ggerganov/llama.cpp/pull/4930) as well (only for < Q6_K).
-* You can merge GGUFs with `gguf-split --merge <first-chunk> <output-file>` although this is not required since [f482bb2e](https://github.com/ggerganov/llama.cpp/commit/f482bb2e4920e544651fb832f2e0bcb4d2ff69ab).
 * What is importance matrix (imatrix)? You can [read more about it from the author here](https://github.com/ggerganov/llama.cpp/pull/4861). Some other info [here](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF/discussions/2#6612840b8377af8668066682).
 * How do I use imatrix quants? Just like any other GGUF, the `.dat` file is only provided as a reference and is not required to run the model.
 * If your last resort is to use an IQ1 quant then go for IQ1_M.

 * The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using [wiki.train.raw](https://huggingface.co/datasets/wikitext).
 * [Which GGUF is right for me? (from Artefact2)](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) - X axis is file size and Y axis is perplexity (lower perplexity is better quality). Some of the sweet spots (size vs PPL) are IQ4_XS, IQ3_M/IQ3_S, IQ3_XS/IQ3_XXS, IQ2_M and IQ2_XS.
 * The [imatrix is being used on the K-quants](https://github.com/ggerganov/llama.cpp/pull/4930) as well (only for < Q6_K).
+* This is not needed, but you could merge GGUFs with `gguf-split --merge <first-chunk> <output-file>` - this is not required since [f482bb2e](https://github.com/ggerganov/llama.cpp/commit/f482bb2e4920e544651fb832f2e0bcb4d2ff69ab).
+* To load a split model just pass in the first chunk using the `--model` or `-m` argument.
 * What is importance matrix (imatrix)? You can [read more about it from the author here](https://github.com/ggerganov/llama.cpp/pull/4861). Some other info [here](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF/discussions/2#6612840b8377af8668066682).
 * How do I use imatrix quants? Just like any other GGUF, the `.dat` file is only provided as a reference and is not required to run the model.
 * If your last resort is to use an IQ1 quant then go for IQ1_M.