arXiv Analytics

Sign in

arXiv:2002.12359 [stat.ML]AbstractReferencesReviewsResources

A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs

Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Robert Jenssen

Published 2020-02-27Version 1

A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient's health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK$_{IM}$, is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK$_{IM}$ is particularly well suited if there is a lack of labels - a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.

Comments: 2020 International Workshop on Health Intelligence, AAAI-20. arXiv admin note: text overlap with arXiv:1907.05251
Categories: stat.ML, cs.LG
Related articles: Most relevant | Search more
arXiv:1312.6956 [stat.ML] (Published 2013-12-25)
Joint segmentation of multivariate time series with hidden process regression for human activity recognition
arXiv:2201.08283 [stat.ML] (Published 2022-01-20)
Lead-lag detection and network clustering for multivariate time series with an application to the US equity market
arXiv:1910.04689 [stat.ML] (Published 2019-10-10)
Graph Spectral Embedding for Parsimonious Transmission of Multivariate Time Series