alpindale
/

c4ai-command-r-plus-GPTQ

@@ -16,6 +16,8 @@ language:
 # Model Card for C4AI Command R+
 ## Model Summary
@@ -92,32 +94,7 @@ print(gen_text)
 **Quantized model through bitsandbytes, 4-bit precision**
-```python
-# pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
-from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
-bnb_config = BitsAndBytesConfig(load_in_4bit=True)
-model_id = "CohereForAI/c4ai-command-r-plus"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
-# Format message with the command-r-plus chat template
-messages = [{"role": "user", "content": "Hello, how are you?"}]
-input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
-## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
-gen_tokens = model.generate(
-    input_ids,
-    max_new_tokens=100,
-    do_sample=True,
-    temperature=0.3,
-    )
-gen_text = tokenizer.decode(gen_tokens[0])
-print(gen_text)
-```
 ## Model Details
@@ -133,6 +110,25 @@ Pre-training data additionally included the following 13 languages: Russian, Pol
 **Context length**: Command R+ supports a context length of 128K.
 ### Tool use & multihop capabilities:
 Command R+ has been specifically trained with conversational tool use capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance, but we encourage experimentation.

 # Model Card for C4AI Command R+
+🚨 **This model is non-quantized version of C4AI Command R+. You can find the quantized version of C4AI Command R+ using bitsandbytes [here](https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit)**.
 ## Model Summary
 **Quantized model through bitsandbytes, 4-bit precision**
+This model is non-quantized version of C4AI Command R+. You can find the quantized version of C4AI Command R+ using bitsandbytes [here](https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit).
 ## Model Details
 **Context length**: Command R+ supports a context length of 128K.
+## Evaluations
+Command R+ has been submitted to the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). We include the results below, along with a direct comparison to the strongest state-of-art open weights models currently available on Hugging Face. We note that these results are only useful to compare when evaluations are implemented for all models in a [standardized way](https://github.com/EleutherAI/lm-evaluation-harness) using publically available code, and hence shouldn't be used for comparison outside of models submitted to the leaderboard or compared to self-reported numbers which can't be replicated in the same way.
+| Model                           |   Average |   Arc (Challenge) |   Hella Swag |   MMLU |   Truthful QA |   Winogrande |   GSM8k |
+|:--------------------------------|----------:|------------------:|-------------:|-------:|--------------:|-------------:|--------:|
+| **CohereForAI/c4ai-command-r-plus** |      74.6 |             70.99 |         88.6 |   75.7 |          56.3 |         85.4 |    70.7 |
+| [DBRX Instruct](https://huggingface.co/databricks/dbrx-instruct)        |      74.5 |             68.9  |         89   |   73.7 |          66.9 |         81.8 |    66.9 |
+| [Mixtral 8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)           |      72.7 |             70.1  |         87.6 |   71.4 |          65   |         81.1 |    61.1 |
+| [Mixtral 8x7B Chat](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)               |      72.6 |             70.2  |         87.6 |   71.2 |          64.6 |         81.4 |    60.7 |
+| [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01)  |      68.5 |             65.5  |         87   |   68.2 |          52.3 |         81.5 |    56.6 |
+| [Llama 2 70B](https://huggingface.co/meta-llama/Llama-2-70b-hf)                     |      67.9 |             67.3  |         87.3 |   69.8 |          44.9 |         83.7 |    54.1 |
+| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)                     |      65.3 |             65.4  |         84.2 |   74.9 |          55.4 |         80.1 |    31.9 |
+| [Gemma-7B](https://huggingface.co/google/gemma-7b)                       |      63.8 |             61.1  |         82.2 |   64.6 |          44.8 |         79   |    50.9 |
+| [LLama 2 70B Chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)             |      62.4 |             64.6  |         85.9 |   63.9 |          52.8 |         80.5 |    26.7 |
+| [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)                 |      61   |             60    |         83.3 |   64.2 |          42.2 |         78.4 |    37.8 |
+We include these metrics here because they are frequently requested, but note that these metrics do not capture RAG, multilingual, tooling performance or the evaluation of open ended generations which we believe Command R+ to be state-of-art at. For evaluations of RAG, multilingual and tooling read more [here](https://txt.cohere.com/command-r-plus-microsoft-azure/). For evaluation of open ended generation, Command R+ is currently being evaluated on the [chatbot arena](https://chat.lmsys.org/).
 ### Tool use & multihop capabilities:
 Command R+ has been specifically trained with conversational tool use capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance, but we encourage experimentation.