{ "id": "1805.09180", "version": "v1", "published": "2018-05-22T17:23:39.000Z", "updated": "2018-05-22T17:23:39.000Z", "title": "Semi-supervised learning: When and why it works", "authors": [ "Alejandro Cholaquidis", "Ricardo Fraimand", "Mariela Sued" ], "comment": "arXiv admin note: substantial text overlap with arXiv:1709.05673", "categories": [ "stat.ML", "cs.LG" ], "abstract": "Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labelled data. Even though this is not always possible (it depends on how useful, for inferring the labels, it would be to know the distribution of the unlabelled data), several algorithm have been proposed recently. A new algorithm is proposed, that under almost necessary conditions, attains asymptotically the performance of the best theoretical rule as the amount of unlabelled data tends to infinity. The set of necessary assumptions, although reasonable, show that semi-parametric classi- fication only works for very well conditioned problems. The perfor- mance of the algorithm is assessed in the well known \"Isolet\" real-data of phonemes, where a strong dependence on the choice of the initial training sample is shown.", "revisions": [ { "version": "v1", "updated": "2018-05-22T17:23:39.000Z" } ], "analyses": { "keywords": [ "semi-supervised learning", "initial training sample", "little labelled data", "best theoretical rule", "unlabelled data tends" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }