Papers
arxiv:2410.01609

DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights

Published on Oct 2, 2024
Authors:
,
,
,

Abstract

Visually-Rich Documents (VRDs), encompassing elements like charts, tables, and references, convey complex information across various fields. However, extracting information from these rich documents is labor-intensive, especially given their inconsistent formats and domain-specific requirements. While <PRE_TAG><PRE_TAG>pretrained models</POST_TAG></POST_TAG> for VRD Understanding have progressed, their reliance on large, annotated datasets limits scalability. This paper introduces the Domain Adaptive Visually-rich Document Understanding (DAViD) framework, which utilises machine-generated synthetic data for domain adaptation. DAViD integrates fine-grained and coarse-grained document representation learning and employs <PRE_TAG><PRE_TAG>synthetic annotations</POST_TAG></POST_TAG> to reduce the need for costly manual labelling. By leveraging <PRE_TAG><PRE_TAG>pretrained models</POST_TAG></POST_TAG> and synthetic data, DAViD achieves competitive performance with minimal annotated datasets. Extensive experiments validate DAViD's effectiveness, demonstrating its ability to efficiently adapt to domain-specific VRDU tasks.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.01609 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.01609 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.01609 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.