Spaces:
Sleeping
Sleeping
File size: 1,291 Bytes
37d9cd4 1be55a8 88e4c5e 37d9cd4 1be55a8 b6349d5 37d9cd4 5dde576 00d88fe 5dde576 1be55a8 5dde576 00d88fe 5dde576 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
tags:
- visual-question-answering-for-fashion-context
sdk: gradio
license: apache-2.0
widget:
- text: "Testing.."
src: "617.jpg"
---
# This is a simple VQA system using Hugging Face, PyTorch and VQA models
-------------
In this repository we created a simple VQA system capable of recognize spatial and context information of fashion images (e.g. clothes color and details).
The project was based in this paper **FashionVQA: A Domain-Specific Visual Question Answering System** [[1]](#1). We also used the VQA pre-trained model from **BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation** [[]](#2) to make the model finetuning the two new models.
We used the datasets **Deep Fashion with Masks** available in <https://huggingface.co/datasets/SaffalPoosh/deepFashion-with-masks> and the **Control Net Dataset** available in <https://huggingface.co/datasets/ldhnam/deepfashion_controlnet>.
## References
<a id="1">[1]</a>
Min Wang and Ata Mahjoubfar and Anupama Joshi, 2022
FashionVQA: A Domain-Specific Visual Question Answering System
<a id="2">[2]</a>
Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi, 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
|