--- license: apache-2.0 base_model: - speakleash/Bielik-11B-v2.3-Instruct pipeline_tag: text-generation tags: - medit-merge language: - pl - en ---
Llama-3.2-MedIT-SUN-2.5B
# Marsh Harrier The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology. ## Key Features: - Built on a pioneering approach to neural network weight fusion - Supports merging models of identical parameter counts while maintaining architecture flexibility - Demonstrates superior performance compared to its base models - Optimized for Polish language understanding and generation ## Performance: The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative. Technical Details: - Base Models: [Speakleash Bielik 11B v2.3 Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct) and [Bielik 11B v2](https://huggingface.co/speakleash/Bielik-11B-v2) - Architecture: Compatible with original Bielik architecture - Parameter Count: 11 billion parameters - Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency. # Polish LLM Open Leaderboard Core Leaderboards: - MT-Bench-PL: slight decrease of 0.3 points (8.27 vs 8.56) - Open PL LLM Leaderboard: improved performance by 0.09 points (65.80 vs 65.71) Sentiment Analysis (PolEmo2): - In-domain accuracy: Matches Bielik at 77.70% - Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%) Text Classification Tasks: - 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%) - Belebele benchmark: Matching performance at 88.56% - CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%) Language Understanding: - DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%) - Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%) - PolQA reranking: Slight decrease (81.99% vs 83.21%) - PPC: Enhanced accuracy (78.00% vs 77.20%) - PSC: Minor F1 score decrease (90.46% vs 93.63%) Overall Performance: MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks. All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative. Kudos to the **[SpeakLeash](https://speakleash.org)** project and **[ACK Cyfronet AGH](https://www.cyfronet.pl/)** for their extraordinary work.