InferenceIllusionist
/

Excalibur-7b-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Mar 16

Commit

f3aba3f

•

1 Parent(s): c18ffb4

Update README.md

Files changed (1) hide show

README.md +81 -0

README.md CHANGED Viewed

@@ -1,3 +1,84 @@
 ---
 license: apache-2.0
 ---

 ---
+base_model: [ibm/merlinite-7b]
+library_name: transformers
+tags:
+- mergekit
+- merge
+- GGUF
 license: apache-2.0
 ---
+# Excalibur-7b GGUF
+<img src="https://i.imgur.com/viIO4WT.png" width="550"/>
+<i>Image generated with Envoid's [Model9](https://huggingface.co/Envoid/model9) SDXL model </i>
+FP16 can be found [here](https://huggingface.co/InferenceIllusionist/Excalibur-7b)
+[Magic-Dolphin-7b](https://huggingface.co/InferenceIllusionist/Magic-Dolphin-7b) was an unexpected surprise. Profoundly satisfied with it as a first attempt. For this follow-up I wanted to target the MMLU benchmark specifically.
+The challenge this time was placing more weight on Merlinite-7b as an unknown quantity that hasn't been in the spotlight despite its novel LAB tuning method.
+<b>Excalibur-7b</b> builds on past success and is the culmination of several learnings:
+* Measuring KL-divergences for new quantization types brought a deeper understanding of benchmarking and assessing model performance
+* This signifcantly sped up the testing process by using MMLU as a base, narrowing down over 10 candidate linear merges to 1: merliniteX-blockB1
+* Reaching the limitations of linear merging necessitated a pivot to reviewing the viability of SLERP, DARE-TIES, and Passthrough methods
+* Thus a competing candidate merge pool was tested between different merge algorithms. Once more the list was narrowed from 10 candidates to 1: merliniteX-blockF2
+* merliniteX-blockF2 (SLERP of Magic-Dolphin-7B and jaskier-7b-dpo in unorthadox proportions) was originally planned for release earlier this week
+* Instead -blockB1 and -blockF2 were merged and the results were placed head to head in a final round of tests. Ultimately a more conventional execution of SLERP showed the best results for the final step.
+# Sample Question
+<img src="https://i.imgur.com/fdFYIhv.jpeg" width="550"/>
+# Bonus Question - Vision Capabilities
+<b>Requires additional [mistral-7b-mmproj-v1.5-Q4_1.gguf](https://huggingface.co/koboldcpp/mmproj/tree/main) file for vision functionality</b>
+<img src="https://i.imgur.com/4wbUrjf.jpeg" width="550"/>
+This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
+## Merge Details
+### Merge Method
+This model was merged using the SLERP merge method.
+### Models Merged
+The following models were included in the merge:
+* models/merliniteX-blockB1
+* models/merliniteX-blockF2
+### Configuration
+The following YAML configuration was used to produce this model:
+```yaml
+slices:
+  - sources:
+      - model: models/merliniteX-blockF2
+        layer_range: [0, 32]
+      - model: models/merliniteX-blockB1
+        layer_range: [0, 32]
+# or, the equivalent models: syntax:
+# models:
+#   - model: psmathur/orca_mini_v3_13b
+#   - model: garage-bAInd/Platypus2-13B
+merge_method: slerp
+base_model: models/merliniteX-blockF2
+parameters:
+  t:
+    - filter: self_attn
+      value: [1, 0.7, 0.3, 0.5, 0]
+    - filter: mlp
+      value: [0, 0.3, 0.7, 0.5, 1]
+    - value: 0.5 # fallback for rest of tensors
+dtype: float16
+```