mkurman's picture
Update README.md
2db5e88 verified
metadata
license: apache-2.0
base_model:
  - speakleash/Bielik-11B-v2.3-Instruct
pipeline_tag: text-generation
tags:
  - medit-merge
language:
  - pl
  - en
Llama-3.2-MedIT-SUN-2.5B

Marsh Harrier

The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology.

Key Features:

  • Built on a pioneering approach to neural network weight fusion
  • Supports merging models of identical parameter counts while maintaining architecture flexibility
  • Demonstrates superior performance compared to its base models
  • Optimized for Polish language understanding and generation

Performance:

The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative.

Technical Details:

  • Base Models: Speakleash Bielik 11B v2.3 Instruct and Bielik 11B v2
  • Architecture: Compatible with original Bielik architecture
  • Parameter Count: 11 billion parameters
  • Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology

This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency.

Polish LLM Open Leaderboard

Core Leaderboards:

  • MT-Bench-PL: slight decrease of 0.3 points (8.27 vs 8.56)
  • Open PL LLM Leaderboard: improved performance by 0.09 points (65.80 vs 65.71)

Sentiment Analysis (PolEmo2):

  • In-domain accuracy: Matches Bielik at 77.70%
  • Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%)

Text Classification Tasks:

  • 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%)
  • Belebele benchmark: Matching performance at 88.56%
  • CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%)

Language Understanding:

  • DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%)
  • Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%)
  • PolQA reranking: Slight decrease (81.99% vs 83.21%)
  • PPC: Enhanced accuracy (78.00% vs 77.20%)
  • PSC: Minor F1 score decrease (90.46% vs 93.63%)

Overall Performance: MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks.

All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative.

Kudos to the SpeakLeash project and ACK Cyfronet AGH for their extraordinary work.