Exporting to GGUF

by krasivayakoshka - opened Dec 2, 2024

Discussion

krasivayakoshka

Dec 2, 2024

Hey, when are you going to support the gguf exporting of multimodal models like llama-3.2-vision?

shimmyshimmer

Unsloth AI org Dec 2, 2024

Hey, when are you going to support the gguf exporting of multimodal models like llama-3.2-vision?

At the moment llama.cpp does not support it, so until they do, we can't really do anything

shimmyshimmer

Unsloth AI org Dec 2, 2024

Technically we can make it work but it wouldn't work anyways since there isn't a place to use it with

krasivayakoshka

Dec 3, 2024

Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision

shimmyshimmer

Unsloth AI org Dec 3, 2024

Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision

Does hugging face? Apologies I'm really not sure. I would recommend just using VLLM.

krasivayakoshka

Dec 4, 2024

Yeah, thanks for a quick reply. Everybody's waiting for a proper multimodal support by llama.cpp. Anyway, right now there's no option to host a QLoRA-finetuned Llama-Vision models for production-ready inference. Or may be I don't know any decent way of doing this. @shimmyshimmer do you have any ideas on prod-ready hosting / deployment of this model with its adapter, Michael? I've tried with fireworks, but it looks like their product does not yet support PEFT-addons for Llama-Vision

Does hugging face? Apologies I'm really not sure. I would recommend just using VLLM.

Unfortunately, vLLM does not yet support Mllama architecture. In case of huggingface I have to try, if here's an option to host model inference with the use of a QLoRA adapter.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment