arXiv Analytics

Sign in

arXiv:2505.24190 [cs.LG]AbstractReferencesReviewsResources

Provably Improving Generalization of Few-Shot Models with Synthetic Data

Lan-Cuong Nguyen, Quan Nguyen-Tri, Bang Tran Khanh, Dung D. Le, Long Tran-Thanh, Khoat Than

Published 2025-05-30, updated 2025-06-25Version 2

Few-shot image classification remains challenging due to the scarcity of labeled training examples. Augmenting them with synthetic data has emerged as a promising way to alleviate this issue, but models trained on synthetic samples often face performance degradation due to the inherent gap between real and synthetic distributions. To address this limitation, we develop a theoretical framework that quantifies the impact of such distribution discrepancies on supervised learning, specifically in the context of image classification. More importantly, our framework suggests practical ways to generate good synthetic samples and to train a predictor with high generalization ability. Building upon this framework, we propose a novel theoretical-based algorithm that integrates prototype learning to optimize both data partitioning and model training, effectively bridging the gap between real few-shot data and synthetic data. Extensive experiments results show that our approach demonstrates superior performance compared to state-of-the-art methods, outperforming them across multiple datasets.

Comments: ICML 2025. Our code is released at https://github.com/Fsoft-AIC/ProtoAug
Categories: cs.LG, cs.CV
Related articles: Most relevant | Search more
arXiv:2212.06896 [cs.LG] (Published 2022-12-13)
In-Season Crop Progress in Unsurveyed Regions using Networks Trained on Synthetic Data
arXiv:2301.07573 [cs.LG] (Published 2023-01-18)
Synthcity: facilitating innovative use cases of synthetic data in different data modalities
arXiv:2407.00116 [cs.LG] (Published 2024-06-27)
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges