File size: 1,291 Bytes
37d9cd4
1be55a8
 
88e4c5e
37d9cd4
1be55a8
 
b6349d5
37d9cd4
 
5dde576
00d88fe
 
 
 
5dde576
1be55a8
5dde576
00d88fe
 
 
 
 
5dde576
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
tags:
- visual-question-answering-for-fashion-context
sdk: gradio
license: apache-2.0
widget:
- text: "Testing.."
  src: "617.jpg"
---

# This is a simple VQA system using Hugging Face, PyTorch and VQA models
-------------

In this repository we created a simple VQA system capable of recognize spatial and context information of fashion images (e.g. clothes color and details). 

The project was based in this paper **FashionVQA: A Domain-Specific Visual Question Answering System** [[1]](#1). We also used the VQA pre-trained model from **BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation** [[]](#2) to make the model finetuning the two new models. 

We used the datasets **Deep Fashion with Masks** available in <https://huggingface.co/datasets/SaffalPoosh/deepFashion-with-masks> and the **Control Net Dataset** available in <https://huggingface.co/datasets/ldhnam/deepfashion_controlnet>.


## References
<a id="1">[1]</a> 
Min Wang and Ata Mahjoubfar and Anupama Joshi, 2022
FashionVQA: A Domain-Specific Visual Question Answering System

<a id="2">[2]</a> 
Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi, 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation