LLM360
/

CrystalChat

@@ -11,7 +11,24 @@ tags:
 # CrystalChat
-We present CrystalChat, an instruction following model finetuned from [LLM360/CrystalCoder](https://huggingface.co/LLM360/CrystalCoder). Here's a comparison table for some popular chat models.
 |           Model          | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 |:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
@@ -21,11 +38,19 @@ We present CrystalChat, an instruction following model finetuned from [LLM360/Cr
 | Llama-2-7b-Chat          | 2T             | 34.11        | 52.86         | 15.35       | 53.07 | 78.39     | 48.42         | 18.88 | 73.09              | 45.30      | 13.26              | 17.43         |
 | AmberChat 7B             | 1.25T          |     -        | 44.76         |     -       | 42.83 | 74.03     | 38.88         | 5.31  | 66.77              | 40.72      |     -              |       -       |
-|||
-<img src="CC-Compare.png" alt="arc" width="400"/>
-|:--|:--|
-|<img src="cc-eval-std-benchmarks.png" alt="arc" width="400"/>  |<img src="cc-eval-lang-compare.png" alt="arc" width="400"/>
 ## Model Description
@@ -57,6 +82,9 @@ print("-"*20 + "Output for model"  + 20 * '-')
 print(tokenizer.batch_decode(gen_tokens)[0])
 ```
 # Citation
 **BibTeX:**

 # CrystalChat
+We present CrystalChat, an instruction following model finetuned from [LLM360/CrystalCoder](https://huggingface.co/LLM360/CrystalCoder). Following the release of [LLM360/AmberChat](https://huggingface.co/LLM360/AmberChat)and [LLM360/AmberSafe](https://huggingface.co/LLM360/AmberSafe) in December 2023, CrystalChat is the next and most performant chat model released under LLM360. CrystalChat is trained on a carefully selected mix publicly available language and code datasets.
+As always, the training data, training code, and metrics are publicly available.
+## About LLM360
+LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
+where all training details, model checkpoints, intermediate results, and
+additional analyses are made available to the community. Our goal is to advance
+the field by inviting the community to deepen the understanding of LLMs
+together. As the first step of the project LLM360, we release all intermediate
+model checkpoints, our fully-prepared pre-training dataset, all source code and
+configurations, and training details. We are
+committed to continually pushing the boundaries of LLMs through this open-source
+effort.
+Get access now at [LLM360 site](https://www.llm360.ai/)
+# CrystalChat Performance
 |           Model          | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 |:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
 | Llama-2-7b-Chat          | 2T             | 34.11        | 52.86         | 15.35       | 53.07 | 78.39     | 48.42         | 18.88 | 73.09              | 45.30      | 13.26              | 17.43         |
 | AmberChat 7B             | 1.25T          |     -        | 44.76         |     -       | 42.83 | 74.03     | 38.88         | 5.31  | 66.77              | 40.72      |     -              |       -       |
+| Combined Language and Coding Ability           |
+|------------------------------------------------|
+<img src="CC-Compare.jpg" alt="arc" width="800"/>
+| Performance on Standard Benchmarks             |
+|------------------------------------------------|
+<img src="cc-eval-std-benchmarks.png" alt="std-bench" width="600"/>
+| Perforamnce on Language Benchmarks                      |
+|---------------------------------------------------------|
+<img src="cc-eval-lang-compare.png" alt="arc" width="600"/>
 ## Model Description
 print(tokenizer.batch_decode(gen_tokens)[0])
 ```
+# Bias, Risks, and Limitations
+CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.
 # Citation
 **BibTeX:**