Vision Enabled???
#3
by
PSM272
- opened
If one were to merge this model (or the original QwQ) with https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5, would it be a good multimodal reasoning model?
That might be interesting...
I have absolutely no experience in doing something like that, though :)