LLava-Next-Video-32B-Qwen giving inaccurate video analysis

#1
by shivanis14 - opened

I used this space https://huggingface.co/spaces/WildVision/vision-arena . Found about this space from github - https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main

Video input to the LLM : https://www.youtube.com/watch?v=51gdmOKs4Ek

Prompt : Is the elderly person in the video safe and comfortable?
Response by LLava-Next-Video-32B-Qwen: Yes, the elderly person appears to be safe and comfortable throughout the video.

Correct Response should have been : Elderly is being physically abused in the video

At what rate are frames extracted in this demo? I suspect it is low causing inaccurate response

Sign up or log in to comment