arXiv:2405.11757 Abstract | arXiv Analytics

arXiv:2405.11757 [cs.CV]Abstract References Reviews Resources

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Published 2024-05-20Version 1

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically used separate models to address individual sub-tasks within DLA, including table/figure detection, text region detection, logical role classification, and reading order prediction. In this work, we propose an end-to-end transformer-based approach for document layout analysis, called DLAFormer, which integrates all these sub-tasks into a single model. To achieve this, we treat various DLA sub-tasks (such as text region detection, logical role classification, and reading order prediction) as relation prediction problems and consolidate these relation prediction labels into a unified label space, allowing a unified relation prediction module to handle multiple tasks concurrently. Additionally, we introduce a novel set of type-wise queries to enhance the physical meaning of content queries in DETR. Moreover, we adopt a coarse-to-fine strategy to accurately identify graphical page objects. Experimental results demonstrate that our proposed DLAFormer outperforms previous approaches that employ multi-branch or multi-stage architectures for multiple tasks on two document layout analysis benchmarks, DocLayNet and Comp-HRDoc.

Comments: ICDAR 2024

Categories: cs.CV

Keywords: end-to-end transformer, identify graphical page objects, text region detection, relation prediction, reading order prediction

Related articles:

arXiv:2106.11539 [cs.CV] (Published 2021-06-22)

DocFormer: End-to-End Transformer for Document Understanding

Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha

arXiv:2203.15350 [cs.CV] (Published 2022-03-29)

End-to-End Transformer Based Model for Image Captioning

Yiyu Wang, Jungang Xu, Yingfei Sun

arXiv Analytics

arXiv:2405.11757 [cs.CV]Abstract References Reviews Resources

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Links

Toolbox

arXiv:2405.11757 [cs.CV]AbstractReferencesReviewsResources

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Links

Toolbox

arXiv:2405.11757 [cs.CV]Abstract References Reviews Resources