InferenceIllusionist
/

Excalibur-7b-DPO

@@ -125,9 +125,14 @@ An initial foray into the world of fine-tuning. The goal of this release was to
 ## Notes & Methodology
 * [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
-* This is a quick experiment to determine the impact of DPO finetuning on the original base model
 * Ran for a little over an hour on a single A100
-* Internal benchmarks showed improvement over base model, awaiting final results
 * Precision: bfloat16

 ## Notes & Methodology
 * [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
+* This is a quick experiment to determine the impact of DPO finetuning on the Excelsior-7b base model
 * Ran for a little over an hour on a single A100
+* Fine-tuning succeeded in making model conversational and more well-rounded
+* Benchmark scores increased in the following categories versus base Excelsior-7b:
+  * ARC: 69.71 -> <b>70.9</b>
+  * HellaSwag: 87.56 -> <b>87.93</b>
+  * TruthfulQA: 67.24 -> <b>70.82</b>
+  * Average: 73.6 -> <b>73.84</b>
 * Precision: bfloat16