sometimesanotion
/

Lamarck-14B-v0.3

@@ -19,7 +19,8 @@ language:
 Lamarck-14B version 0.3 is strongly based on [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) as a diffuse influence for prose and reasoning.  Arcee's pioneering use of distillation and innovative merge techniques create a diverse knowledge pool for its models.
-### Merge Strategy:
 - Two model_stocks used to begin specialized branches for reasoning and prose quality.
 - For refinement on Virtuoso as a base model, DELLA and SLERP include the model_stocks while re-emphasizing selected ancestors.
 - For integration, a SLERP merge of Virtuoso with the converged branches.
@@ -36,3 +37,157 @@ Lamarck-14B version 0.3 is strongly based on [arcee-ai/Virtuoso-Small](https://h
 - **[CultriX/Qwen2.5-14B-Wernicke](http://huggingface.co/CultriX/Qwen2.5-14B-Wernicke)** - A top performer for Arc and GPQA, Wernicke is re-emphasized in small but highly-ranked portions of the model.
 ![graph.png](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3-experimental/resolve/main/graph.png)

 Lamarck-14B version 0.3 is strongly based on [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) as a diffuse influence for prose and reasoning.  Arcee's pioneering use of distillation and innovative merge techniques create a diverse knowledge pool for its models.
+### Overview:
 - Two model_stocks used to begin specialized branches for reasoning and prose quality.
 - For refinement on Virtuoso as a base model, DELLA and SLERP include the model_stocks while re-emphasizing selected ancestors.
 - For integration, a SLERP merge of Virtuoso with the converged branches.
 - **[CultriX/Qwen2.5-14B-Wernicke](http://huggingface.co/CultriX/Qwen2.5-14B-Wernicke)** - A top performer for Arc and GPQA, Wernicke is re-emphasized in small but highly-ranked portions of the model.
 ![graph.png](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3-experimental/resolve/main/graph.png)
+### Merge Strategy:
+```yaml
+name:                lamarck-14b-reason-della                  # This contributes the knowledge and reasoning pool, later to be merged
+merge_method:        della                                     # with the dominant instruction-following model
+base_model:          arcee-ai/Virtuoso-Small
+tokenizer_source:    arcee-ai/Virtuoso-Small
+parameters:
+  int8_mask:         false
+  normalize:         true
+  rescale:           false
+  density:           0.30
+  weight:            0.50
+  epsilon:           0.08
+  lambda:            1.00
+models:
+  - model:           CultriX/SeQwence-14B-EvolMerge
+    parameters:
+      density:       0.70
+      weight:        0.90
+  - model:           sometimesanotion/lamarck-14b-reason-model_stock
+    parameters:
+      density:       0.90
+      weight:        0.60
+  - model:           CultriX/Qwen2.5-14B-Wernicke
+    parameters:
+      density:       0.20
+      weight:        0.30
+dtype:               bfloat16
+out_dtype:           bfloat16
+---
+name:                lamarck-14b-prose-della                  # This contributes the prose, later to be merged
+merge_method:        della                                    # with the dominant instruction-following model
+base_model:          arcee-ai/Virtuoso-Small
+tokenizer_source:    arcee-ai/Virtuoso-Small
+parameters:
+  int8_mask:         false
+  normalize:         true
+  rescale:           false
+  density:           0.30
+  weight:            0.50
+  epsilon:           0.08
+  lambda:            0.95
+models:
+  - model:           sthenno-com/miscii-14b-1028
+    parameters:
+      density:       0.40
+      weight:        0.90
+  - model:           sometimesanotion/lamarck-14b-prose-model_stock
+    parameters:
+      density:       0.60
+      weight:        0.70
+  - model:           underwoods/medius-erebus-magnum-14b
+dtype:               bfloat16
+out_dtype:           bfloat16
+---
+name:                lamarck-14b-converge-della                # This is the strongest control point to quickly
+merge_method:        della                                     # re-balance reasoning vs. prose
+base_model:          arcee-ai/Virtuoso-Small
+tokenizer_source:    arcee-ai/Virtuoso-Small
+parameters:
+  int8_mask:         false
+  normalize:         true
+  rescale:           false
+  density:           0.30
+  weight:            0.50
+  epsilon:           0.08
+  lambda:            1.00
+models:
+  - model:           sometimesanotion/lamarck-14b-reason-della
+    parameters:
+      density:       0.80
+      weight:        1.00
+  - model:           arcee-ai/Virtuoso-Small
+    parameters:
+      density:       0.40
+      weight:        0.50
+  - model:           sometimesanotion/lamarck-14b-prose-della
+    parameters:
+      density:       0.10
+      weight:        0.40
+dtype:               bfloat16
+out_dtype:           bfloat16
+---
+name:                lamarck-14b-converge                     # Virtuoso has good capabilities all-around; it is 100% of the first
+merge_method:        slerp                                    # two layers, and blends into the reasoning+prose convergance
+base_model:          arcee-ai/Virtuoso-Small                  # for some interesting boosts
+tokenizer_source:    base
+parameters:
+  t:                 [ 0.00, 0.60, 0.80, 0.80, 0.80, 0.70, 0.40 ]
+slices:
+  - sources:
+    - layer_range:   [ 0, 2 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 0, 2 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.00, 0.00 ]
+  - sources:
+    - layer_range:   [ 2, 8 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 2, 8 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.00, 0.60 ]
+  - sources:
+    - layer_range:   [ 8, 16 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 8, 16 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.60, 0.70 ]
+  - sources:
+    - layer_range:   [ 16, 24 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 16, 24 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.70, 0.70 ]
+  - sources:
+    - layer_range:   [ 24, 32 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 24, 32 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.70, 0.70 ]
+  - sources:
+    - layer_range:   [ 32, 40 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 32, 40 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.70, 0.60 ]
+  - sources:
+    - layer_range:   [ 40, 48 ]
+      model:         arcee-ai/Virtuoso-Small
+    - layer_range:   [ 40, 48 ]
+      model:         merges/lamarck-14b-converge-della
+    t:               [ 0.60, 0.40 ]
+dtype:               bfloat16
+out_dtype:           bfloat16
+---
+name:                lamarck-14b-finalize
+merge_method:        ties
+base_model:          Qwen/Qwen2.5-14B
+tokenizer_source:    Qwen/Qwen2.5-14B-Instruct
+parameters:
+  int8_mask:         false
+  normalize:         true
+  rescale:           false
+  density:           1.00
+  weight:            1.00
+models:
+  - model:           merges/lamarck-14b-converge
+dtype:               bfloat16
+out_dtype:           bfloat16
+---
+```