InferenceIllusionist
/

Excalibur-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Mar 16, 2024

Commit

acf5423

·

verified ·

1 Parent(s): ceb9fd0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ license: apache-2.0
 [Magic-Dolphin-7b](https://huggingface.co/InferenceIllusionist/Magic-Dolphin-7b) was an unexpected surprise. Profoundly satisfied with it as a first attempt. For this follow-up I wanted to target the MMLU benchmark specifically.
 The challenge this time was placing more weight on Merlinite-7b as an unknown quantity that hasn't been in the spotlight despite its novel LAB tuning method.
-<b>Excalibur-7b</b> builds on past success and is the culimation of several learnings:
 * Measuring KL-divergences for new quantization types brought a deeper understanding of benchmarking and assessing model performance
 * This signifcantly sped up the testing process by using MMLU as a base, narrowing down over 10 candidate linear merges to 1: merliniteX-blockB1
 * Reaching the limitations of linear merging necessitated a pivot to reviewing the viability of SLERP, DARE-TIES, and Passthrough methods

 [Magic-Dolphin-7b](https://huggingface.co/InferenceIllusionist/Magic-Dolphin-7b) was an unexpected surprise. Profoundly satisfied with it as a first attempt. For this follow-up I wanted to target the MMLU benchmark specifically.
 The challenge this time was placing more weight on Merlinite-7b as an unknown quantity that hasn't been in the spotlight despite its novel LAB tuning method.
+<b>Excalibur-7b</b> builds on past success and is the culmination of several learnings:
 * Measuring KL-divergences for new quantization types brought a deeper understanding of benchmarking and assessing model performance
 * This signifcantly sped up the testing process by using MMLU as a base, narrowing down over 10 candidate linear merges to 1: merliniteX-blockB1
 * Reaching the limitations of linear merging necessitated a pivot to reviewing the viability of SLERP, DARE-TIES, and Passthrough methods