Qwen15-DeepSeek-Coder-Merge
This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.
About Me
I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.
๐ Connect with me on LinkedIn
Merge Details
Merge Method
This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:
- Weighted Blend: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
- Complete Layer Merging: Full layer-range coverage ensures comprehensive knowledge transfer
- Format: bfloat16 precision for efficient memory usage
Models Merged
- Qwen/Qwen1.5-7B-Chat - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
- deepseek-ai/deepseek-coder-6.7b-instruct - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities
Configuration
slices:
- sources:
- model: Qwen/Qwen1.5-7B-Chat
layer_range: [0, 32]
- model: deepseek-ai/deepseek-coder-6.7b-instruct
layer_range: [0, 32]
merge_method: slerp
base_model: Qwen/Qwen1.5-7B-Chat
parameters:
t: 0.6
dtype: bfloat16
Model Capabilities
This merge combines:
- Qwen 1.5's strong instruction following and general knowledge capabilities
- DeepSeek Coder's specialized programming expertise and code generation abilities
- Enhanced technical understanding and explanation capabilities
- Fully open architecture with no usage restrictions
The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:
- Code generation across multiple programming languages
- Technical documentation and explanations
- Algorithm implementation and problem-solving
- Software development assistance with natural language understanding
- Debugging and code optimization suggestions
Limitations
- Inherits limitations from both base models
- May exhibit inconsistent behavior for certain advanced programming tasks
- No additional alignment or fine-tuning beyond the base models' training
- Model was created through parameter merging without additional training data
- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts
License
This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.
- Downloads last month
- 16