Marsh Harrier
The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology.
Key Features:
- Built on a pioneering approach to neural network weight fusion
- Supports merging models of identical parameter counts while maintaining architecture flexibility
- Demonstrates superior performance compared to its base models
- Optimized for Polish language understanding and generation
Performance:
The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative.
Technical Details:
- Base Models: Speakleash Bielik 11B v2.3 Instruct and Bielik 11B v2
- Architecture: Compatible with original Bielik architecture
- Parameter Count: 11 billion parameters
- Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology
This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency.
Polish LLM Open Leaderboard
Core Leaderboards:
- MT-Bench-PL: slight decrease of 0.3 points (8.27 vs 8.56)
- Open PL LLM Leaderboard: improved performance by 0.09 points (65.80 vs 65.71)
Sentiment Analysis (PolEmo2):
- In-domain accuracy: Matches Bielik at 77.70%
- Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%)
Text Classification Tasks:
- 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%)
- Belebele benchmark: Matching performance at 88.56%
- CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%)
Language Understanding:
- DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%)
- Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%)
- PolQA reranking: Slight decrease (81.99% vs 83.21%)
- PPC: Enhanced accuracy (78.00% vs 77.20%)
- PSC: Minor F1 score decrease (90.46% vs 93.63%)
Overall Performance: MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks.
All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative.
Kudos to the SpeakLeash project and ACK Cyfronet AGH for their extraordinary work.
- Downloads last month
- 1,687