{ "id": "2408.03977", "version": "v1", "published": "2024-08-07T14:15:18.000Z", "updated": "2024-08-07T14:15:18.000Z", "title": "Learning from Noisy Labels for Long-tailed Data via Optimal Transport", "authors": [ "Mengting Li", "Chuang Zhu" ], "categories": [ "cs.LG", "cs.AI" ], "abstract": "Noisy labels, which are common in real-world datasets, can significantly impair the training of deep learning models. However, recent adversarial noise-combating methods overlook the long-tailed distribution of real data, which can significantly harm the effect of denoising strategies. Meanwhile, the mismanagement of noisy labels further compromises the model's ability to handle long-tailed data. To tackle this issue, we propose a novel approach to manage data characterized by both long-tailed distributions and noisy labels. First, we introduce a loss-distance cross-selection module, which integrates class predictions and feature distributions to filter clean samples, effectively addressing uncertainties introduced by noisy labels and long-tailed distributions. Subsequently, we employ optimal transport strategies to generate pseudo-labels for the noise set in a semi-supervised training manner, enhancing pseudo-label quality while mitigating the effects of sample scarcity caused by the long-tailed distribution. We conduct experiments on both synthetic and real-world datasets, and the comprehensive experimental results demonstrate that our method surpasses current state-of-the-art methods. Our code will be available in the future.", "revisions": [ { "version": "v1", "updated": "2024-08-07T14:15:18.000Z" } ], "analyses": { "keywords": [ "noisy labels", "long-tailed data", "long-tailed distribution", "method surpasses current state-of-the-art methods", "real-world datasets" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }