It is a simple application that builds on top of Alpaca-LORA by using BLIP-2 to analyze pictures, and send the captions and other prompts using VQA as context for ALPACA. This then allows ALPACA to use images as an additional input to guide its outputs.
For example:
Your need to confirm your account before you can post a new comment.