Any explanation for this bug and what it causes ?

#1
by manu - opened

What are we doing by correcting this here ?

Thanks

Fix a bug in transformers/models/qwen2_vl/modeling_qwen2_vl.py around line 1774

position_ids = position_ids.unsqueeze(0).expand(3, -1, -1)
# make sure the following three line are inside the 'else' statement
if cache_position[0] != 0:
    pixel_values = None
    pixel_values_videos = None

I mean, I see it bugs otherwise, but why do you even need to prepare_inputs_for_generation in your case where everything is a single forward pass ?

if cache_position[0] != 0

bug here is because cache position here is None, need to make the if statement to be something like if cache_position is not None and cache_position[0] != 0

I tried to drop prepare_inputs_for_generation and directly run forward(), but seems the input is not valid.
prepare_inputs_for_generation seems to have some steps to deal with the shape of pixel value inputs.

yeah, I guess the rope deltas are what's making it bug for multigpu ...
Planning on open an issue on HF for this ?

Sign up or log in to comment