Vision Enabled???

by PSM272 - opened about 6 hours ago

about 6 hours ago

If one were to merge this model (or the original QwQ) with https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5, would it be a good multimodal reasoning model?

Owner about 2 hours ago

That might be interesting...

I have absolutely no experience in doing something like that, though :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment