lmms-lab/LLaVA-NeXT-Video-32B-Qwen · Which Tokenizer Should I Use with Qwen-Based LLaVA-NeXT-Video Models?

Hi,

I’m currently working with the LLaVA-NeXT-Video-32B-Qwen model and I’ve encountered some issues regarding the tokenizer and processor.

In the previous demos, I used the LlavaNextVideoProcessor for handling both video and text inputs, but recently, this processor seems to no longer function as expected. I’m also receiving warnings about a tokenizer mismatch. I'm now unsure which tokenizer I should be using for this model, as it seems to expect Qwen2Tokenizer, but I’m not sure if that’s fully compatible with the model in its current state.

So which tokenizer should I be using with the Qwen-based LLaVA-NeXT-Video models? Should I manually load Qwen2Tokenizer or is there an automatic way for AutoProcessor to handle this?

Any insights or suggestions for best practices when working with this model would be greatly appreciated!