arXiv Analytics

Sign in

arXiv:2201.12803 [cs.LG]AbstractReferencesReviewsResources

Similarity and Generalization: From Noise to Corruption

Nayara Fonseca, Veronica Guidetti

Published 2022-01-30Version 1

Contrastive learning aims to extract distinctive features from data by finding an embedding representation where similar samples are close to each other, and different ones are far apart. We study generalization in contrastive learning, focusing on its simplest representative: Siamese Neural Networks (SNNs). We show that Double Descent also appears in SNNs and is exacerbated by noise. We point out that SNNs can be affected by two distinct sources of noise: Pair Label Noise (PLN) and Single Label Noise (SLN). The effect of SLN is asymmetric, but it preserves similarity relations, while PLN is symmetric but breaks transitivity. We show that the dataset topology crucially affects generalization. While sparse datasets show the same performances under SLN and PLN for an equal amount of noise, SLN outperforms PLN in the overparametrized region in dense datasets. Indeed, in this regime, PLN similarity violation becomes macroscopical, corrupting the dataset to the point where complete overfitting cannot be achieved. We call this phenomenon Density-Induced Break of Similarity (DIBS). We also probe the equivalence between online optimization and offline generalization for similarity tasks. We observe that an online/offline correspondence in similarity learning can be affected by both the network architecture and label noise.

Related articles: Most relevant | Search more
arXiv:2208.06530 [cs.LG] (Published 2022-08-12)
Siamese neural networks for a generalized, quantitative comparison of complex model outputs
arXiv:1909.13355 [cs.LG] (Published 2019-09-29)
Siamese Neural Networks for Wireless Positioning and Channel Charting
arXiv:1906.08988 [cs.LG] (Published 2019-06-21)
A Fourier Perspective on Model Robustness in Computer Vision