Exporting to GGUF
Hey, when are you going to support the gguf exporting of multimodal models like llama-3.2-vision?
Hey, when are you going to support the gguf exporting of multimodal models like llama-3.2-vision?
At the moment llama.cpp does not support it, so until they do, we can't really do anything
Technically we can make it work but it wouldn't work anyways since there isn't a place to use it with
Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision
Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision
Does hugging face? Apologies I'm really not sure. I would recommend just using VLLM.
Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision
Does hugging face? Apologies I'm really not sure. I would recommend just using VLLM.
Unfortunately, vLLM does not yet support Mllama architecture. In case of huggingface I have to try, if here's an option to host model inference with the use of a QLoRA adapter.