SuperNova-Lite-Hermes-3-Llama-3.1-8B_TIES_with_base_Embeddings_Pre-Initialized
This merge is successful. Not adding or editorializing the model card right now. I need sleep. But, resultant model works great! This experiment revealed two things. One, distilled instruct models work best for TIES merging with the base and other models; the experiment showed that this is due to the way that distilled models are trained vs non-distilled models: when merged with other models, the distilled models seem to retain more of their attributes (the way that they talk, think, reason, etc) - this makes them very appealing for model merges because you keep more of the model's inherent capabilities and behaviors. And, two: I can successfully TIES merge different instruct models with their base pre-initialized to the embeddings special tokens (for prompt/chat template). The model is coherent and capable. Please download and try it if your interested. GGUF Custom OQ8_0-F32_EF32 IQuants will be up by the middle of the week - most probably sooner but still...
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the TIES merge method using /Users/jsarnecki/opt/mergekit/merges/Llama-3.1-8B-InitializedEmbeddings_with_Hermes-3 as a base.
Models Merged
The following models were included in the merge:
- /Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite
- /Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B
Configuration
The following YAML configuration was used to produce this model:
models:
- model: "/Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite"
parameters:
weight: 1
density: 1
- model: "/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B"
parameters:
weight: 1
density: 1
- model: "/Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite"
parameters:
weight: 1
density: 1
- model: "/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B"
parameters:
weight: 1
density: 1
merge_method: ties
base_model: "/Users/jsarnecki/opt/mergekit/merges/Llama-3.1-8B-InitializedEmbeddings_with_Hermes-3"
parameters:
density: 1
normalize: true
int8_mask: true
tokenizer_source: "/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B"
dtype: float32
out_dtype: bfloat16
- Downloads last month
- 51