arXiv:2304.03717 Abstract | arXiv Analytics

arXiv:2304.03717 [cs.LG]Abstract References Reviews Resources

On the Importance of Contrastive Loss in Multimodal Learning

Published 2023-04-07Version 1

Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2021)) have received huge success in multimodal learning, where the model tries to minimize the distance between the representations of different views (e.g., image and its caption) of the same data point while keeping the representations of different data points away from each other. However, from a theoretical perspective, it is unclear how contrastive learning can learn the representations from different views efficiently, especially when the data is not isotropic. In this work, we analyze the training dynamics of a simple multimodal contrastive learning model and show that contrastive pairs are important for the model to efficiently balance the learned representations. In particular, we show that the positive pairs will drive the model to align the representations at the cost of increasing the condition number, while the negative pairs will reduce the condition number, keeping the learned representations balanced.

Categories: cs.LG, cs.CL, cs.CV

Keywords: multimodal learning, contrastive loss, condition number, importance, learned representations

Related articles: Most relevant | Search more

arXiv:2202.06218 [cs.LG] (Published 2022-02-13)

Emotion Based Hate Speech Detection using Multimodal Learning

Aneri Rana, Sonali Jha

arXiv:1708.00631 [cs.LG] (Published 2017-08-02)

On the Importance of Consistency in Training Deep Neural Networks

Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

arXiv:1407.4070 [cs.LG] (Published 2014-07-15)

Fast matrix completion without the condition number

Moritz Hardt, Mary Wootters