arXiv:2304.04385 Abstract | arXiv Analytics

arXiv:2304.04385 [cs.LG]Abstract References Reviews Resources

On Robustness in Multimodal Learning

randon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Published 2023-04-10Version 1

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms. We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods. Further, we identify robustness short-comings of these approaches and propose two intervention techniques leading to $1.5\times$-$4\times$ robustness improvements on three datasets, AudioSet, Kinetics-400 and ImageNet-Captions. Finally, we demonstrate that these interventions better utilize additional modalities, if present, to achieve competitive results of $44.2$ mAP on AudioSet 20K.

Categories: cs.LG

Keywords: multimodal learning, interventions better utilize additional modalities, common multimodal representation learning methods, multimodal robustness framework, multiple heterogeneous input modalities

Related articles: Most relevant | Search more

arXiv:2402.06223 [cs.LG] (Published 2024-02-09, updated 2025-05-12)

Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning

Yuhang Liu et al.

arXiv:2202.06218 [cs.LG] (Published 2022-02-13)

Emotion Based Hate Speech Detection using Multimodal Learning

Aneri Rana, Sonali Jha

arXiv:2312.00935 [cs.LG] (Published 2023-12-01)

A Theory of Unimodal Bias in Multimodal Learning

Yedi Zhang, Peter E. Latham, Andrew Saxe

arXiv Analytics

arXiv:2304.04385 [cs.LG]Abstract References Reviews Resources

On Robustness in Multimodal Learning

Links

Toolbox

arXiv:2304.04385 [cs.LG]AbstractReferencesReviewsResources

On Robustness in Multimodal Learning

Links

Toolbox

arXiv:2304.04385 [cs.LG]Abstract References Reviews Resources