{ "id": "1307.0803", "version": "v1", "published": "2013-07-02T19:35:21.000Z", "updated": "2013-07-02T19:35:21.000Z", "title": "Data Fusion by Matrix Factorization", "authors": [ "Marinka Žitnik", "Blaž Zupan" ], "comment": "Preprint, 13 pages, 3 Figures, 3 Tables", "categories": [ "cs.LG", "cs.AI", "cs.DB", "stat.ML" ], "abstract": "For most problems in science and engineering we can obtain data that describe the system from various perspectives and record the behaviour of its individual components. Heterogeneous data sources can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with data on the context or additional constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data sets that can be expressed in a matrix, including those from attribute-based representations, ontologies, associations and networks. We demonstrate its utility on a gene function prediction problem in a case study with eleven different data sources. Our fusion algorithm compares favourably to state-of-the-art multiple kernel learning and achieves higher accuracy than can be obtained from any single data source alone.", "revisions": [ { "version": "v1", "updated": "2013-07-02T19:35:21.000Z" } ], "analyses": { "subjects": [ "15A83", "15A23", "40C05", "H.2.8", "G.1.3", "I.2.6", "65F30", "H.3.3" ], "keywords": [ "matrix factorization", "gene function prediction problem", "achieves higher accuracy", "state-of-the-art multiple kernel", "fusion algorithm compares" ], "note": { "typesetting": "TeX", "pages": 13, "language": "en", "license": "arXiv", "status": "editable", "adsabs": "2013arXiv1307.0803Z" } } }