Upload mmproj-Qwen2-VL-2B-Instruct-f16.gguf

by stduhpf - opened Dec 15, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-0

stduhpf

Dec 15, 2024

More options for mmproj quantization would be better, I think. F32 mmproj is still fairly large, and from my limited testing, f16 seem to perform perfectly fine.

Upload mmproj-Qwen2-VL-2B-Instruct-f16.ggufe568caa5

bartowski

Owner Dec 16, 2024

yeah i wasn't sure about this one, simply because the qwen2vl code defaults to f32 so i thought maybe there was something important about it..

Lyte

Dec 16, 2024

It's better to allow more options. From what I've tested, F16 works just fine as well. I've used the same prompt and image to test them, and it came out identical (maybe very minor differences, but it didn't give incorrect answers). Cheers!

bartowski

Owner Dec 17, 2024

Yeah thanks for confirming! Uploaded my own just so I can guarantee origin, but appreciate the help :)

bartowski changed pull request status to closed Dec 17, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment