arXiv:1906.08255 Abstract | arXiv Analytics

arXiv:1906.08255 [cs.LG]Abstract References Reviews Resources

Training on test data: Removing near duplicates in Fashion-MNIST

Published 2019-06-19Version 1

MNIST and Fashion MNIST are extremely popular for testing in the machine learning space. Fashion MNIST improves on MNIST by introducing a harder problem, increasing the diversity of testing sets, and more accurately representing a modern computer vision task. In order to increase the data quality of FashionMNIST, this paper investigates near duplicate images between training and testing sets. Near-duplicates between testing and training sets artificially increase the testing accuracy of machine learning models. This paper identifies near-duplicate images in Fashion MNIST and proposes a dataset with near-duplicates removed.

Categories: cs.LG, cs.CV, stat.ML

Keywords: test data, fashion mnist, paper identifies near-duplicate images, fashion-mnist, testing sets

Related articles: Most relevant | Search more

arXiv:1901.05744 [cs.LG] (Published 2019-01-17)

The Oracle of DLphi

Weston Baines et al.

arXiv:2105.11570 [cs.LG] (Published 2021-05-24)

Robust Fairness-aware Learning Under Sample Selection Bias

Wei Du, Xintao Wu

arXiv:2202.03613 [cs.LG] (Published 2022-02-08)

Conformal prediction for the design problem

Clara Fannjiang, Stephen Bates, Anastasios Angelopoulos, Jennifer Listgarten, Michael I. Jordan

arXiv Analytics

arXiv:1906.08255 [cs.LG]Abstract References Reviews Resources

Training on test data: Removing near duplicates in Fashion-MNIST

Links

Toolbox

arXiv:1906.08255 [cs.LG]AbstractReferencesReviewsResources

Training on test data: Removing near duplicates in Fashion-MNIST

Links

Toolbox

arXiv:1906.08255 [cs.LG]Abstract References Reviews Resources