LLaDoc (Large Language and Document) model

This is a fine-tuned model of LLaVA1.5 (7B) on the iDocVQA dataset. It is intended to be used as a multimodal system. The dataset it's trained on is limited in scope, as it covers only certain domains.

The accuracy achieved on the validation set is 29.58%.

Please find the information about preprocessing, training and full details of the LLaVA model in the original link

The paper for this work is available on arXiv: https://arxiv.org/abs/2402.00453

Downloads last month: 7

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.