can you plesse share how to make this version?
thanks
Yeah, you can simply use mlx-lm to do that. You can upload new models to Hugging Face by specifying --upload-repo for conversion. For instance, if you want to upload a quantized Mistral-7B model to the MLX Hugging Face community, you can run:
python -m mlx_lm.convert
--hf-path mistralai/Mistral-7B-v0.1
-q
--upload-repo mlx-community/my-4bit-mistral
Yeah, you can simply use mlx-lm to do that. You can upload new models to Hugging Face by specifying --upload-repo for conversion. For instance, if you want to upload a quantized Mistral-7B model to the MLX Hugging Face community, you can run:
python -m mlx_lm.convert
--hf-path mistralai/Mistral-7B-v0.1
-q
--upload-repo mlx-community/my-4bit-mistral
I indeed did the same way for my MOE model cloudyu/Mixtral_34Bx2_MoE_60B https://huggingface.co/cloudyu/Mixtral_34Bx2_MoE_60B
But the mlx version doesn't work, I don't know the reason.
it report:
mlx-examples/llms/mlx_lm/models/mixtral.py", line 137, in call
mx.argpartition(-gates, kth=ne, axis=-1)[:, :ne]
ValueError: [argpartition] Received invalid kth 2 along axis -1 for array with shape: (1,2)