File size: 3,048 Bytes

dc91234

---
pipeline_tag: text-generation
inference: false
license: apache-2.0
model-index:
- name: ibm/PowerMoE-3b
  results:
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: ARC
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 58.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: BoolQ
    metrics:
    - name: accuracy
      type: accuracy
      value: 65
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: Hellaswag
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 71.5
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: OpenBookQA
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 41
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: PIQA
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 79.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: Winogrande
    metrics:
    - name: accuracy-norm
      type: accuracy-norm
      value: 65
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: MMLU (5 shot)
    metrics:
    - name: accuracy
      type: accuracy
      value: 42.8
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: GSM8k (5 shot)
    metrics:
    - name: accuracy
      type: accuracy
      value: 25.9
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lm-eval-harness
      name: math (4 shot)
    metrics:
    - name: accuracy
      type: accuracy
      value: 14.8
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode-eval
      name: humaneval
    metrics:
    - name: pass@1
      type: pass@1
      value: 20.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode-eval
      name: MBPP
    metrics:
    - name: pass@1
      type: pass@1
      value: 32.4
      verified: false
base_model:
- ibm/PowerMoE-3b
---

## Model Summary
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
Paper: https://arxiv.org/abs/2408.13359

This is a GGUF quantized version.

## Usage
Requires latest llama.cpp to run.

### Generation
This is a simple example of how to use the PowerMoe GGUF:

./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"