how to increase the number of input frames

#7
by smilyface - opened

Thanks for your prompt resolution of my previous question !

I have another question: I uploaded a 20s video (with 460 frames in total), and the model seems to understand the content. But It takes only the first 8 frames. My question is how I can increase the number of input frames?

I tried changing num_frames from 8 to 64 in vision_encoder in config.json, but I couldn't launch the demo with some tensor mismatching errors, so I reset it to 8 and launched the demo, and then changed the "Input Frames" to 64 via the slider on the gradio page. This is what I got

image.png

image.png

Neither of the answers 10 or 100 is correct, but I'm not sure it's due to the hallucination of the LLM or the model actually takes 10 frames as input. My question here is: How do I know the number of frames the model actually takes as input?

Thanks a lot in advance!

OpenGVLab org

Since the video frames are compressed by the Qformer token into the input model, the answer of the model here is an illusion. You can enter multiple frames by adjusting the bar in the lower left corner.

Got it! Thanks a lot for your response!

smilyface changed discussion status to closed

Sign up or log in to comment