arXiv:2004.10605 Abstract | arXiv Analytics

arXiv:2004.10605 [cs.CV]Abstract References Reviews Resources

Self-Supervised Representation Learning on Document Images

Adrian Cosma, Mihai Ghidoveanu, Michael Panaitescu-Liess, Marius Popescu

Published 2020-04-18Version 1

This work analyses the impact of self-supervised pre-training on document images. While previous approaches explore the effect of self-supervision on natural images, we show that patch-based pre-training performs poorly on text document images because of their different structural properties and poor intra-sample semantic information. We propose two context-aware alternatives to improve performance. We also propose a novel method for self-supervision, which makes use of the inherent multi-modality of documents (image and text), which performs better than other popular self-supervised methods, including supervised ImageNet pre-training.

Comments: 15 pages, 5 figures. Accepted at DAS 2020: IAPR International Workshop on Document Analysis Systems

Categories: cs.CV, cs.LG, eess.IV, stat.ML

Subjects: 68T05

Keywords: self-supervised representation learning, poor intra-sample semantic information, text document images, inherent multi-modality, novel method

Related articles: Most relevant | Search more

arXiv:2003.00105 [cs.CV] (Published 2020-02-28)

Self-supervised Representation Learning for Ultrasound Video

Jianbo Jiao, Richard Droste, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

arXiv:2101.06553 [cs.CV] (Published 2021-01-16)

Self-Supervised Representation Learning from Flow Equivariance

Yuwen Xiong, Mengye Ren, Wenyuan Zeng, Raquel Urtasun

arXiv:2009.07994 [cs.CV] (Published 2020-09-17)