pmysl
/

c4ai-command-r-plus-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

c4ai-command-r-plus-GGUF / README.md

pmysl's picture

Update README.md

8d3d329 7 months ago

|

1.73 kB

	---
	license: cc-by-nc-4.0
	pipeline_tag: text-generation
	base_model: CohereForAI/c4ai-command-r-plus
	---

	# Command R+ GGUF

	## Description
	This repository contains GGUF weights for `llama.cpp`. Support for them was added in release [`b2636`](https://github.com/ggerganov/llama.cpp/releases/tag/b2636). Since commit `dd2d53a`, all weights in this repo have chat templates.

	In the folder `imatrix`, you can find imatrix quants. The importance matrix was trained using [kalomaze's](https://github.com/kalomaze) [`groups_merged.txt`](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384).


	## Quickstart
	1. Ensure that you have release [`b2636`](https://github.com/ggerganov/llama.cpp/releases/tag/b2636) or newer.
	2. Start with the command below:
	```bash
	./main -p "<\|START_OF_TURN_TOKEN\|><\|USER_TOKEN\|>Who are you?<\|END_OF_TURN_TOKEN\|><\|START_OF_TURN_TOKEN\|><\|CHATBOT_TOKEN\|>" --color -m /path/to/command-r-plus-Q3_K_L-00001-of-00002.gguf
	```

	## Perplexity on `wikitext-2-raw` [WIP]
	\| Variant \| PPL Value \| Standard Deviation \|
	\|----------\|-----------\|--------------------\|
	\| Q2_K \| 5.7178 \| +/- 0.03418 \|
	\| Q3_K_L \| 4.6214 \| +/- 0.02629 \|
	\| Q4_K_M \| 4.4625 \| +/- 0.02522 \|
	\| f16 \| 4.3845 \| +/- 0.02468 \|

	## Merging Weights
	After commit `8a28d12`, weights are split with `gguf-split`, which means that you don't have to merge weights. Simply pass the first split, as in the example above, and `llama.cpp` will automatically load all splits. If, for some reason, you want to merge splits, you can use the following command:
	```bash
	./gguf-split --merge /path/to/command-r-plus-f16-00001-of-00005.gguf /path/to/command-r-plus-f16-combined.gguf
	```