num_local_experts== num_experts_per_tok means all experts is in use. But diff rate of diff model
so any new model we can use?
· Sign up or log in to comment