arXiv Analytics

Sign in

arXiv:2311.18531 [cs.CV]AbstractReferencesReviewsResources

Dataset Distillation via the Wasserstein Metric

Haoyang Liu, Tiancheng Xing, Luwei Li, Vibhu Dalal, Jingrui He, Haohan Wang

Published 2023-11-30Version 1

Dataset distillation (DD) offers a compelling approach in computer vision, with the goal of condensing extensive datasets into smaller synthetic versions without sacrificing much of the model performance. In this paper, we continue to study the methods for DD, by addressing its conceptually core objective: how to capture the essential representation of extensive datasets in smaller, synthetic forms. We propose a novel approach utilizing the Wasserstein distance, a metric rooted in optimal transport theory, to enhance distribution matching in DD. Our method leverages the Wasserstein barycenter, offering a geometrically meaningful way to quantify distribution differences and effectively capture the centroid of a set of distributions. Our approach retains the computational benefits of distribution matching-based methods while achieving new state-of-the-art performance on several benchmarks. To provide useful prior for learning the images, we embed the synthetic data into the feature space of pretrained classification models to conduct distribution matching. Extensive testing on various high-resolution datasets confirms the effectiveness and adaptability of our method, indicating the promising yet unexplored capabilities of Wasserstein metrics in dataset distillation.

Related articles: Most relevant | Search more
arXiv:2203.11932 [cs.CV] (Published 2022-03-22)
Dataset Distillation by Matching Training Trajectories
arXiv:2210.16774 [cs.CV] (Published 2022-10-30)
Dataset Distillation via Factorization
arXiv:2007.13010 [cs.CV] (Published 2020-07-25)
Style is a Distribution of Features