OpenGVLab/InternVideo2-Chat-8B · how to increase the number of input frames

Aug 29

Thanks for your prompt resolution of my previous question !

I have another question: I uploaded a 20s video (with 460 frames in total), and the model seems to understand the content. But It takes only the first 8 frames. My question is how I can increase the number of input frames?

I tried changing num_frames from 8 to 64 in vision_encoder in config.json, but I couldn't launch the demo with some tensor mismatching errors, so I reset it to 8 and launched the demo, and then changed the "Input Frames" to 64 via the slider on the gradio page. This is what I got

Neither of the answers 10 or 100 is correct, but I'm not sure it's due to the hallucination of the LLM or the model actually takes 10 frames as input. My question here is: How do I know the number of frames the model actually takes as input?

Thanks a lot in advance!

ynhe

OpenGVLab org Aug 29

Since the video frames are compressed by the Qformer token into the input model, the answer of the model here is an illusion. You can enter multiple frames by adjusting the bar in the lower left corner.

smilyface

Aug 29

Got it! Thanks a lot for your response!

smilyface changed discussion status to closed Aug 29