arXiv:1807.00612 Abstract | arXiv Analytics

arXiv:1807.00612 [cs.CV]Abstract References Reviews Resources

Multi-modal Egocentric Activity Recognition using Audio-Visual Features

Mehmet Ali Arabacı, Fatih Özkan, Elif Surer, Peter Jančovič, Alptekin Temizel

Published 2018-07-02Version 1

Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost). For that purpose, firstly grid optical-flow, virtual-inertia feature, log-covariance, cuboid are extracted from the video. The audio signal is characterized using a "supervector", obtained based on Gaussian mixture modelling of frame-level features, followed by a maximum a-posteriori adaptation. Then, the extracted multi-modal features are adaptively fused by MKL classifiers in which both the feature and kernel selection/weighing and recognition tasks are performed together. The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods.

Categories: cs.CV

Keywords: multi-modal egocentric activity recognition, egocentric activity recognition problem, multi-modal features, existing methods, maximum a-posteriori adaptation

Related articles: Most relevant | Search more

arXiv:1805.09243 [cs.CV] (Published 2018-05-23)

Subspace Clustering by Block Diagonal Representation

Canyi Lu, Jiashi Feng, Zhouchen Lin, Tao Mei, Shuicheng Yan

arXiv:2205.02413 [cs.CV] (Published 2022-05-05)

Surface Reconstruction from Point Clouds: A Survey and a Benchmark

Zhangjin Huang, Yuxin Wen, Zihao Wang, Jinjuan Ren, Kui Jia

arXiv:2204.02547 [cs.CV] (Published 2022-04-06)

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation

Wangbo Zhao, Kai Wang, Xiangxiang Chu, Fuzhao Xue, Xinchao Wang, Yang You