plaguss's picture
plaguss HF staff
Add nous benchmark
f8ef65e verified
|
raw
history blame
No virus
1.43 kB
metadata
license: apache-2.0
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
language:
  - en
tags:
  - distilabel
  - dpo
  - rlaif
  - rlhf

⚗️ distilabeled Marcoro14 7B Slerp

Built with Distilabel

Benchmark results

For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and score>5).

For running the benchmark we used another awesome contribution from Maxime: LLM AutoEval, check it out!

Model AGIEval GPT4ALL TruthfulQA Bigbench Average
argilla/distilabeled-Marcoro14-7B-slerp 45.4 76.47 65.46 47.19 58.63
Marcoro14-7B-slerp 44.66 76.24 64.15 45.64 57.67
argilla/distilabeled-Hermes-2.5-Mistral-7B 44.64 73.35 55.96 42.21 54.04