Spaces:
Runtime error
Visual Question Answering and Image Captioning using BLIP and OpenVINO
BLIP is a pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. This tutorial considers ways to use BLIP for visual question answering and image captioning.
The complete pipeline of this demo is shown below:
Image Captioning
The following image shows an example of the input image and generated caption:
Visual Question Answering
Notebook Contents
This folder contains notebook that show how to convert and optimize model with OpenVINO: The tutorial consists of the following parts:
- Instantiate a BLIP model.
- Convert the BLIP model to OpenVINO IR.
- Run visual question answering and image captioning with OpenVINO.
- Optimize BLIP model using NNCF
- Compare original and optimized models
- Launch interactive demo
Installation Instructions
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.