TobDeBer
/

PowerMoe-3b-GGUF

Text Generation

Model card Files Files and versions Community

PowerMoe-3b-GGUF / README.md

TobDeBer's picture

Update README.md

dc91234 verified 3 months ago

|

history blame contribute delete

3.05 kB

	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	model-index:
	- name: ibm/PowerMoE-3b
	results:
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: ARC
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 58.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: BoolQ
	metrics:
	- name: accuracy
	type: accuracy
	value: 65
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: Hellaswag
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 71.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: OpenBookQA
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 41
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: PIQA
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 79.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: Winogrande
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 65
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: MMLU (5 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 42.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: GSM8k (5 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 25.9
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: math (4 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 14.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode-eval
	name: humaneval
	metrics:
	- name: pass@1
	type: pass@1
	value: 20.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode-eval
	name: MBPP
	metrics:
	- name: pass@1
	type: pass@1
	value: 32.4
	verified: false
	base_model:
	- ibm/PowerMoE-3b
	---

	## Model Summary
	PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
	Paper: https://arxiv.org/abs/2408.13359

	This is a GGUF quantized version.

	## Usage
	Requires latest llama.cpp to run.

	### Generation
	This is a simple example of how to use the PowerMoe GGUF:

	./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"