chargoddard
/

llama-2-26b-trenchcoat-stack

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chargoddard commited on Sep 12, 2023

Commit

388f3eb

·

1 Parent(s): 075d67c

Update README.md

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -15,4 +15,19 @@ layer_slices:
     end: 40
 ```
-No fine tuning was done on this model. Yes, it's still coherent somehow.

     end: 40
 ```
+No fine tuning was done on this model. Yes, it's still coherent somehow.
+Benchmark results:
+| Benchmark | Llama2-13b | Llama2-26b-tcs | Percent Change |
+| --- | --- | --- | --- |
+| ARC | 59.3 | 55.03 | -7.2% |
+| HellaSwag | 82.15 | 79.9 | -2.74% |
+| MMLU | 55.67 | 53.73| -3.48% |
+| TruthfulQA | 37.39 | 40.48 | +5.59% |
+| Average | 58.63 | 57.29 | -2.29% |
+| Average Minus TQA | 65.70 | 62.85 | -4.34% |
+This tells us two very important things:
+1. TruthfulQA is a perfect benchmark in every way.
+2. Llama models are amazingly robust to being fed their own output.