arXiv:2207.06224 Abstract | arXiv Analytics

arXiv:2207.06224 [cs.CV]Abstract References Reviews Resources

Beyond Hard Labels: Investigating data label distributions

Vasco Grossmann, Lars Schmarje, Reinhard Koch

Published 2022-07-13Version 1

High-quality data is a key aspect of modern machine learning. However, labels generated by humans suffer from issues like label noise and class ambiguities. We raise the question of whether hard labels are sufficient to represent the underlying ground truth distribution in the presence of these inherent imprecision. Therefore, we compare the disparity of learning with hard and soft labels quantitatively and qualitatively for a synthetic and a real-world dataset. We show that the application of soft labels leads to improved performance and yields a more regular structure of the internal feature space.

Categories: cs.CV, cs.LG

Keywords: investigating data label distributions, hard labels, soft labels, ground truth distribution, internal feature space

Related articles: Most relevant | Search more

arXiv:2303.16296 [cs.CV] (Published 2023-03-28)

Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels

Zifu Wang, Teodora Popordanoska, Jeroen Bertels, Robin Lemmens, Matthew B. Blaschko

arXiv:2302.05666 [cs.CV] (Published 2023-02-11)

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Zifu Wang, Matthew B. Blaschko

arXiv:2401.05570 [cs.CV] (Published 2024-01-10)

Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening Mammograms

Kevin Van Vorst, Li Shen