DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights
Abstract
Visually-Rich Documents (VRDs), encompassing elements like charts, tables, and references, convey complex information across various fields. However, extracting information from these rich documents is labor-intensive, especially given their inconsistent formats and domain-specific requirements. While <PRE_TAG><PRE_TAG>pretrained models</POST_TAG></POST_TAG> for VRD Understanding have progressed, their reliance on large, annotated datasets limits scalability. This paper introduces the Domain Adaptive Visually-rich Document Understanding (DAViD) framework, which utilises machine-generated synthetic data for domain adaptation. DAViD integrates fine-grained and coarse-grained document representation learning and employs <PRE_TAG><PRE_TAG>synthetic annotations</POST_TAG></POST_TAG> to reduce the need for costly manual labelling. By leveraging <PRE_TAG><PRE_TAG>pretrained models</POST_TAG></POST_TAG> and synthetic data, DAViD achieves competitive performance with minimal annotated datasets. Extensive experiments validate DAViD's effectiveness, demonstrating its ability to efficiently adapt to domain-specific VRDU tasks.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper