Exporting to GGUF

#7
by krasivayakoshka - opened

Hey, when are you going to support the gguf exporting of multimodal models like llama-3.2-vision?

Unsloth AI org

Hey, when are you going to support the gguf exporting of multimodal models like llama-3.2-vision?

At the moment llama.cpp does not support it, so until they do, we can't really do anything

Unsloth AI org

Technically we can make it work but it wouldn't work anyways since there isn't a place to use it with

Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision

Unsloth AI org

Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision

Does hugging face? Apologies I'm really not sure. I would recommend just using VLLM.

Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision

Does hugging face? Apologies I'm really not sure. I would recommend just using VLLM.

Unfortunately, vLLM does not yet support Mllama architecture. In case of huggingface I have to try, if here's an option to host model inference with the use of a QLoRA adapter.

Sign up or log in to comment