Vision Enabled???

#3
by PSM272 - opened

If one were to merge this model (or the original QwQ) with https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5, would it be a good multimodal reasoning model?

That might be interesting...

I have absolutely no experience in doing something like that, though :)

Sign up or log in to comment