lgaalves
/

llama-2-13b-chat-platypus

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lgaalves commited on Sep 12, 2023

Commit

6170e7f

•

1 Parent(s): 828aa10

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -19,13 +19,13 @@ language:
 | Metric                | llama-2-13b-chat-platypus | garage-bAInd/Platypus2-13B| llama-2-13b-chat-hf  (base) |
 |-----------------------|-------|-------|-------|
-| Avg.                  | -|61.35| 59.93 |
-| ARC (25-shot)         | -|61.26| 59.04 |
-| HellaSwag (10-shot)   | -|82.56| 81.94 |
-| MMLU (5-shot)         | -|56.7| 54.64 |
-| TruthfulQA (0-shot)   | -|44.86| 44.12 |
 We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
 ### Model Details

 | Metric                | llama-2-13b-chat-platypus | garage-bAInd/Platypus2-13B| llama-2-13b-chat-hf  (base) |
 |-----------------------|-------|-------|-------|
+| Avg.                  | 58.8 |**61.35**| 59.93 |
+| ARC (25-shot)         | 53.84|**61.26**| 59.04 |
+| HellaSwag (10-shot)   | 80.67|**82.56**| 81.94 |
+| MMLU (5-shot)         | 54.44|**56.7**| 54.64 |
+| TruthfulQA (0-shot)   | **46.23**|44.86| 44.12 |
 We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
 ### Model Details