lewtun HF staff commited on
Commit
55c3d3e
1 Parent(s): bf5bc06

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -35,6 +35,16 @@ StarChat is a series of language models that are trained to act as helpful codin
35
  - **Repository:** https://github.com/huggingface/alignment-handbook
36
  - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
37
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Intended uses & limitations
40
 
 
35
  - **Repository:** https://github.com/huggingface/alignment-handbook
36
  - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
37
 
38
+ ## Performance
39
+
40
+ StarChat2 15B was trained to balance chat and programming capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911), as well as the canonical HumanEval benchmark for Python code completion. The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite (commit `988959cb905df4baa050f82b4d499d46e8b537f2`) and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
41
+
42
+ | Model | MT Bench | IFEval | HumanEval |
43
+ |-------------------------------------------------------------------------------------------------|---------:|-------:|----------:|
44
+ | [starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1) | 7.66 | 35.12 | 71.34 |
45
+ | [deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) | 4.17 | 14.23 | 80.48 |
46
+ | [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) | 6.80 | 43.44 | 50.60 |
47
+
48
 
49
  ## Intended uses & limitations
50