vikhyatk/moondream2 · Flexibility to change max generated token.

Hello guys, firstly, i wonder to say, what an amazing model! I think it`s the fastest multimodal model that i had tested.

Is there some possibility to make the max token generation flexibe? Some answers i don't need to waste inference time generating a big text about the image, only "yes, i have a found an apple in the image." or "No."
Or i can instruct the prompt to do that?

And again, amazing model, this could be the first open multimodal model to recognize the environment in realtime in a personal hardware.