A confusion on cross-attention module

by tibetgao - opened Jan 18, 2024

Jan 18, 2024

Hi there,
According to your tech report, you have mentioned that there is a position-aware vision-language adaptor, which comprise a single-layer cross-attention. However, though reading your code I can't find this module, but only a concatenation of the visual embedding and the hidden state. Will you kindly point it out please?

Best regards

BabyChou

Feb 28, 2024

This is in the visual.py file under the class Resampler

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment