arXiv Analytics

Sign in

arXiv:2303.14358 [cs.CV]AbstractReferencesReviewsResources

Multi-view knowledge distillation transformer for human action recognition

Ying-Chen Lin, Vincent S. Tseng

Published 2023-03-25Version 1

Recently, Transformer-based methods have been utilized to improve the performance of human action recognition. However, most of these studies assume that multi-view data is complete, which may not always be the case in real-world scenarios. Therefore, this paper presents a novel Multi-view Knowledge Distillation Transformer (MKDT) framework that consists of a teacher network and a student network. This framework aims to handle incomplete human action problems in real-world applications. Specifically, the multi-view knowledge distillation transformer uses a hierarchical vision transformer with shifted windows to capture more spatial-temporal information. Experimental results demonstrate that our framework outperforms the CNN-based method on three public datasets.

Related articles: Most relevant | Search more
arXiv:1907.06670 [cs.CV] (Published 2019-07-15)
Slow Feature Analysis for Human Action Recognition
arXiv:2101.07618 [cs.CV] (Published 2021-01-19)
Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences
arXiv:1602.00828 [cs.CV] (Published 2016-02-02)
Learning a Deep Model for Human Action Recognition from Novel Viewpoints