lgaalves
/

mistral-7b-platypus1k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lgaalves commited on Oct 11, 2023

Commit

8ed4904

•

1 Parent(s): c34c4a2

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -17,13 +17,13 @@ language:
 ### Benchmark Metrics
-| Metric                | mistral-7b-v0.1-platypus1k | garage-bAInd/Platypus2-7B| meta-llama/Llama-2-7b-hf  (base) |
 |-----------------------|-------|-------|-------|
-| Avg.                  | - |**56.13** | 54.32 |
-| ARC (25-shot)         | - |**55.2**| 53.07 |
-| HellaSwag (10-shot)   | - |**78.84**| 78.59 |
-| MMLU (5-shot)         | - |**49.83**| 46.87 |
-| TruthfulQA (0-shot)   | - |40.64| 38.76 |
 We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.

 ### Benchmark Metrics
+| Metric                | mistral-7b-v0.1-platypus1k | mistralai/Mistral-7B-v0.1 |garage-bAInd/Platypus2-7B|
 |-----------------------|-------|-------|-------|
+| Avg.                  | **63.66** | 62.4 |56.13|
+| ARC (25-shot)         | **61.60** | 59.98|55.20|
+| HellaSwag (10-shot)   | 82.93 |**83.31** |78.84|
+| MMLU (5-shot)         | 63.16 |**64.16** |49.83|
+| TruthfulQA (0-shot)   | **46.96** | 42.15 |40.64|
 We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.