leaderboard-pr-bot's picture
Adding Evaluation Results
33f8516 verified
|
raw
history blame
6.31 kB
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - moe
  - mergekit
  - MoErges
base_model:
  - mistralai/Mistral-7B-v0.3
pipeline_tag: text-classification
model-index:
  - name: MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 16.97
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 8.87
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 0.3
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1.23
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 7.85
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 4.21
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial
          name: Open LLM Leaderboard

Model Name: Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial - Mixture of Experts (MoE)

Description:

This is a cutting-edge Mixture of Experts (MoE) model designed with 24-bit precision, tailored to excel in four key domains: mathematics, coding, storytelling, and general chat. Built with a dynamic mixture of expert layers, this model adapts to different tasks by routing inputs to the most relevant expert network, delivering high-quality outputs efficiently.

Key Features

•	Mathematics Expert: Equipped with specialized mathematical reasoning capabilities, this model is fine-tuned for solving complex mathematical problems, numerical computations, and providing detailed explanations for mathematical concepts.
•	Coding Expert: The model has been trained extensively on various programming languages and software development paradigms. It can help generate, debug, and explain code snippets, offering a comprehensive coding support experience.
•	Storytelling Expert: Designed to assist in creative writing, this expert focuses on generating narratives, constructing dialogues, and offering story-building support for various genres.
•	General Chat Expert: Capable of engaging in everyday conversations, offering accurate and contextually appropriate responses. This expert is versatile and adaptive to different conversational tones, whether it’s casual chit-chat or formal assistance.

Technical Specifications

•	Model Architecture: Mixture of Experts (MoE) with a gating mechanism that routes inputs to the most relevant expert networks.
•	Domains:
•	Mathematics: Advanced reasoning and problem-solving.
•	Coding: Programming support across multiple languages.
•	Storytelling: Creative writing and narrative generation.
•	General Chat: Versatile dialogue handling for various conversational contexts.
•	Training Data: The model was trained on diverse datasets that cover each expert domain, ensuring robustness and versatility.
•	Framework: Developed using [Nom du Framework, par exemple: PyTorch, TensorFlow], optimized for the MoE architecture with gated routing.

Usage

This model can be used for a wide range of applications:

•	Educational Tools: Assisting with mathematical problems, coding exercises, and creative writing tasks.
•	Software Development: Providing coding suggestions, code completion, and debugging support.
•	Creative Writing: Generating stories, dialogues, and narrative content.
•	Conversational Agents: Implementing chatbots with versatile conversational abilities.

Limitations

•	The model may occasionally generate responses that are not entirely contextually appropriate, especially in cases requiring highly specialized domain knowledge.
•	Despite its 24-bit precision, it may not perform well with extremely large datasets or tasks that require higher precision levels.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 6.57
IFEval (0-Shot) 16.97
BBH (3-Shot) 8.87
MATH Lvl 5 (4-Shot) 0.30
GPQA (0-shot) 1.23
MuSR (0-shot) 7.85
MMLU-PRO (5-shot) 4.21