{ "id": "1809.01921", "version": "v1", "published": "2018-09-06T10:57:51.000Z", "updated": "2018-09-06T10:57:51.000Z", "title": "RDPD: Rich Data Helps Poor Data via Imitation", "authors": [ "Shenda Hong", "Cao Xiao", "Tengfei Ma", "Hongyan Li", "Jimeng Sun" ], "categories": [ "cs.LG", "stat.ML" ], "abstract": "In many situations, we have both rich- and poor- data environments: in a rich-data environment (e.g., intensive care units), we have high-quality multi-modality data. On the other hand, in a poor-data environment (e.g., at home), we often only have access to a single data modality with low quality. How can we learn an accurate and efficient model for the poor-data environment by leveraging multi-modality data from the rich-data environment? In this work, we propose a knowledge distillation model RDPD to enhance a small model trained on poor data with a complex model trained on rich data. In an end-to-end fashion, RDPD trains a student model built on a single modality data (poor data) to imitate the behavior and performance of a teacher model from multimodal data (rich data) via jointly optimizing the combined loss of attention imitation and target imitation. We evaluated RDPD on three real-world datasets. RDPD consistently outperformed all baselines across all three datasets, especially achieving the greatest performance improvement over a standard neural network model trained on the common features (Direct model) by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over the standard knowledge distillation model by 5.91% on PR-AUC and 4.44% on ROC-AUC.", "revisions": [ { "version": "v1", "updated": "2018-09-06T10:57:51.000Z" } ], "analyses": { "keywords": [ "rich data helps poor data", "environment", "standard neural network model", "standard knowledge distillation model", "knowledge distillation model rdpd" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }