mistral-community/Mixtral-8x22B-v0.1 · How many active parameters does this model have?

Apr 11, 2024

Does anyone know how many active parameters this model has? Is it a similar calculation to the Mixtral-8x7B model or something new altogether?

qnixsynapse

Apr 11, 2024

Since 2 experts are used in forward pass, it looks like 44B from the name. However, it should be lower. Something around 30B active parameters.

kyo-takano

Apr 11, 2024

This model has 140620634112 parameters.
Each expert has 3 * (hidden_size * intermediate_size)=301989888 parameters.
The number of active parameters is 140620634112 - 56 * (8 - 2) * 301989888 = 39152031744, which is approximately 39B.

kyo-takano

Apr 11, 2024

In case you want to check this out, here is a simple code:

from transformers import AutoModelForCausalLM, AutoConfig
config = AutoConfig.from_pretrained("mistral-community/Mixtral-8x22B-v0.1")
from accelerate import init_empty_weights
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)
N_total = sum(p.numel() for p in model.parameters())
expert = model.model.layers[0].block_sparse_moe.experts[0]
N_per_expert = sum(p.numel() for p in expert.parameters())
print(N_total - 56 * (8 - 2) * N_per_expert)