A confusion on cross-attention module
#9
by
tibetgao
- opened
Hi there,
According to your tech report, you have mentioned that there is a position-aware vision-language adaptor, which comprise a single-layer cross-attention. However, though reading your code I can't find this module, but only a concatenation of the visual embedding and the hidden state. Will you kindly point it out please?
Best regards
This is in the visual.py file under the class Resampler