Our approach has a significant social impact, considering the sheer number of use-cases for this model and dataset. - Translating the data using the existing MT models is a huge plus since multilingual data is scarce. - The model we created is easy to use and easy-to-train (hassle-free). - A multilingual model which answers questions based on a image has many usecases: - Healthcare Chatbots - Personal Assistants - Devices for visually-impaired people and so on. With more and better training, we should be able to produce models that work across several languages and help solve several real-life problems for the community.