arXiv Analytics

Sign in

arXiv:2012.02033 [cs.CV]AbstractReferencesReviewsResources

SuperOCR: A Conversion from Optical Character Recognition to Image Captioning

Baohua Sun, Michael Lin, Hao Sha, Lin Yang

Published 2020-11-21Version 1

Optical Character Recognition (OCR) has many real world applications. The existing methods normally detect where the characters are, and then recognize the character for each detected location. Thus the accuracy of characters recognition is impacted by the performance of characters detection. In this paper, we propose a method for recognizing characters without detecting the location of each character. This is done by converting the OCR task into an image captioning task. One advantage of the proposed method is that the labeled bounding boxes for the characters are not needed during training. The experimental results show the proposed method outperforms the existing methods on both the license plate recognition and the watermeter character recognition tasks. The proposed method is also deployed into a low-power (300mW) CNN accelerator chip connected to a Raspberry Pi 3 for on-device applications.

Related articles: Most relevant | Search more
arXiv:1903.12020 [cs.CV] (Published 2019-03-28)
Describing like humans: on diversity in image captioning
arXiv:2210.10914 [cs.CV] (Published 2022-10-19)
Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning
arXiv:2107.14178 [cs.CV] (Published 2021-07-29, updated 2022-07-14)
ReFormer: The Relational Transformer for Image Captioning