Usage

Metharme format (Mistral works too but untested)

Upscaled Tuning Experiment Write Up Thingy

What is the 39B Upscale?

https://huggingface.co/TheSkullery/BA-Zephyria-39b

merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 41]
    model: unsloth/Mistral-Small-Instruct-2409
- sources:
  - layer_range: [19, 41]
    model: unsloth/Mistral-Small-Instruct-2409
    parameters:
      scale:
      - filter: o_proj
        value: 0.0
      - filter: down_proj
        value: 0.0
      - value: 1.0
- sources:
  - layer_range: [19, 41]
    model: unsloth/Mistral-Small-Instruct-2409
    parameters:
      scale:
      - filter: o_proj
        value: 0.0
      - filter: down_proj
        value: 0.0
      - value: 1.0
- sources:
  - layer_range: [41, 55]
    model: unsloth/Mistral-Small-Instruct-2409

Layers 0 to 18 are original
Layers 19 to 41 are duplicated, zero'd out, and put in the middle twice
Layers 42 to 54 are original
down_proj and o_proj layers for the duplicated part have been nulled and will require healing to 'unignore' the added layers

[    Unique    ][    Duplicated    ][    Unique    ]
0 ----------- 18 19 ------------ 41 42 ---------- 54
     34.5%           41.8%            23.7%

Weight Difference Visualization

Nemo x Rocinante
Small x Cydonia
39B Upscale x Tunguska 1 Epoch
39B Upscale x Tunguska 2 Epochs
Tunguska 1 Epoch x Tunguska 2 Epochs

Control Sample A (Nemo & Rocinante, similar training)

Also note the layer sequence and other labels since it will be unreadable for the 39B

TheDrummer
/

Tunguska-39B-v1-GGUF

Usage

Upscaled Tuning Experiment Write Up Thingy

What is the 39B Upscale?

Weight Difference Visualization

Control Sample A (Nemo & Rocinante, similar training)

Control Sample B (Small & Cydonia, similar training)

Tunguska 39B 1 Epoch vs. its base

Tunguska 39B 2 Epochs vs. its base

Tunguska 39B 1 Epoch vs 2 Epochs