SuperNova-Lite-Hermes-3-Llama-3.1-8B_TIES_with_base_Embeddings_Pre-Initialized

This merge is successful. Not adding or editorializing the model card right now. I need sleep. But, resultant model works great! This experiment revealed two things. One, distilled instruct models work best for TIES merging with the base and other models; the experiment showed that this is due to the way that distilled models are trained vs non-distilled models: when merged with other models, the distilled models seem to retain more of their attributes (the way that they talk, think, reason, etc) - this makes them very appealing for model merges because you keep more of the model's inherent capabilities and behaviors. And, two: I can successfully TIES merge different instruct models with their base pre-initialized to the embeddings special tokens (for prompt/chat template). The model is coherent and capable. Please download and try it if your interested. GGUF Custom OQ8_0-F32_EF32 IQuants will be up by the middle of the week - most probably sooner but still...

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the TIES merge method using /Users/jsarnecki/opt/mergekit/merges/Llama-3.1-8B-InitializedEmbeddings_with_Hermes-3 as a base.

Models Merged

The following models were included in the merge:

/Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite
/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B

Configuration

The following YAML configuration was used to produce this model:

models:

  - model: "/Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite"
    parameters:
      weight: 1
      density: 1

  - model: "/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B"
    parameters:
      weight: 1
      density: 1
  
  - model: "/Users/jsarnecki/opt/Workspace/arcee-ai/Llama-3.1-SuperNova-Lite"
    parameters:
      weight: 1
      density: 1

  - model: "/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B"
    parameters:
      weight: 1
      density: 1
  
merge_method: ties
base_model: "/Users/jsarnecki/opt/mergekit/merges/Llama-3.1-8B-InitializedEmbeddings_with_Hermes-3"
parameters:
  density: 1
  normalize: true
  int8_mask: true
tokenizer_source: "/Users/jsarnecki/opt/Workspace/NousResearch/Hermes-3-Llama-3.1-8B"
dtype: float32
out_dtype: bfloat16