merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using Qwen/Qwen2.5-14B as a base.
Models Merged
The following models were included in the merge:
- sometimesanotion/Lamarck-14B-v0.3
- CultriX/Qwen2.5-14B-Wernicke
- CultriX/SeQwence-14B
- allknowingroger/QwenStock3-14B
- VAGOsolutions/SauerkrautLM-v2-14b-DPO
- sometimesanotion/Qwen2.5-14B-Vimarckoso
Configuration
The following YAML configuration was used to produce this model:
models:
- model: CultriX/Qwen2.5-14B-Wernicke
parameters:
weight: 0.25 # GPQA leader, also strong in MUSR/MMLU-PRO
density: 0.6 # Retain majority for complex reasoning tasks
- model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
parameters:
weight: 0.25 # Top IFEval and good MATH support
density: 0.6 # Ensure factual and mathematical integrity
- model: allknowingroger/QwenStock3-14B
parameters:
weight: 0.20 # Highest MMLU-PRO for broad domain strength
density: 0.5 # Balanced retention for general expertise
- model: CultriX/SeQwence-14B
parameters:
weight: 0.20 # Near-top MATH and well-rounded performance
density: 0.5 # Efficient parameter usage for stable improvement
- model: sometimesanotion/Lamarck-14B-v0.3
parameters:
weight: 0.05 # Top BBH to ensure benchmark coverage
density: 0.4 # Light integration focusing on key parameters
- model: sometimesanotion/Qwen2.5-14B-Vimarckoso
parameters:
weight: 0.05 # MUSR leader for nuanced, multi-step reasoning
density: 0.4 # Targeted retention for domain-specific strengths
base_model: Qwen/Qwen2.5-14B
merge_method: dare_ties
parameters:
normalize: true # Ensure parameter scale alignment
int8_mask: true # Memory/computation efficiency
dtype: bfloat16
tokenizer_source: Qwen/Qwen2.5-14B-Instruct
- Downloads last month
- 85
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.