File size: 3,048 Bytes
dc91234 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
pipeline_tag: text-generation
inference: false
license: apache-2.0
model-index:
- name: ibm/PowerMoE-3b
results:
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: ARC
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 58.1
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: BoolQ
metrics:
- name: accuracy
type: accuracy
value: 65
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: Hellaswag
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 71.5
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: OpenBookQA
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 41
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: PIQA
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 79.1
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: Winogrande
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 65
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: MMLU (5 shot)
metrics:
- name: accuracy
type: accuracy
value: 42.8
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: GSM8k (5 shot)
metrics:
- name: accuracy
type: accuracy
value: 25.9
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: math (4 shot)
metrics:
- name: accuracy
type: accuracy
value: 14.8
verified: false
- task:
type: text-generation
dataset:
type: bigcode-eval
name: humaneval
metrics:
- name: pass@1
type: pass@1
value: 20.1
verified: false
- task:
type: text-generation
dataset:
type: bigcode-eval
name: MBPP
metrics:
- name: pass@1
type: pass@1
value: 32.4
verified: false
base_model:
- ibm/PowerMoE-3b
---
## Model Summary
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
Paper: https://arxiv.org/abs/2408.13359
This is a GGUF quantized version.
## Usage
Requires latest llama.cpp to run.
### Generation
This is a simple example of how to use the PowerMoe GGUF:
./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?" |