arXiv:2203.14215 Abstract | arXiv Analytics

arXiv:2203.14215 [cs.CV]Abstract References Reviews Resources

Knowledge Mining with Scene Text for Fine-Grained Recognition

Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao Liu, Bo Ren, Xiang Bai, Wenyu Liu

Published 2022-03-27Version 1

Recently, the semantics of scene text has been proven to be essential in fine-grained image classification. However, the existing methods mainly exploit the literal meaning of scene text for fine-grained recognition, which might be irrelevant when it is not significantly related to objects/scenes. We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image and enhance the semantics and correlation to fine-tune the image representation. Unlike the existing methods, our model integrates three modalities: visual feature extraction, text semantics extraction, and correlating background knowledge to fine-grained image classification. Specifically, we employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification. Experiments on two benchmark datasets, Con-Text, and Drink Bottle, show that our method outperforms the state-of-the-art by 3.72\% mAP and 5.39\% mAP, respectively. To further validate the effectiveness of the proposed method, we create a new dataset on crowd activity recognition for the evaluation. The source code and new dataset of this work are available at https://github.com/lanfeng4659/KnowledgeMiningWithSceneText.

Comments: Accepted to CVPR 2022. The source code and new dataset of this work are available at https://github.com/lanfeng4659/KnowledgeMiningWithSceneText

Categories: cs.CV

Keywords: fine-grained recognition, knowledge mining, fine-grained image classification, mines implicit contextual knowledge, text semantics extraction

Tags: github project

Related articles: Most relevant | Search more

arXiv:2203.03253 [cs.CV] (Published 2022-03-07)

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information

Lingfeng Yang et al.

arXiv:2311.04157 [cs.CV] (Published 2023-11-07)

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Dipanjyoti Paul et al.

arXiv:1909.08950 [cs.CV] (Published 2019-09-19)

Count, Crop and Recognise: Fine-Grained Recognition in the Wild

Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman

arXiv Analytics

arXiv:2203.14215 [cs.CV]Abstract References Reviews Resources

Knowledge Mining with Scene Text for Fine-Grained Recognition

Links

Toolbox

arXiv:2203.14215 [cs.CV]AbstractReferencesReviewsResources

Knowledge Mining with Scene Text for Fine-Grained Recognition

Links

Toolbox

arXiv:2203.14215 [cs.CV]Abstract References Reviews Resources