• Interpretable Early Classification of Multivariate Time Series

      Obradovic, Zoran; Vucetic, Slobodan; Yates, Alexander; Sapienza, Carmen (Temple University. Libraries, 2013)
      Recent advances in technology have led to an explosion in data collection over time rather than in a single snapshot. For example, microarray technology allows us to measure gene expression levels in different conditions over time. Such temporal data grants the opportunity for data miners to develop algorithms to address domain-related problems, e.g. a time series of several different classes can be created, by observing various patient attributes over time and the task is to classify unseen patient based on his temporal observations. In time-sensitive applications such as medical applications, some certain aspects have to be considered besides providing accurate classification. The first aspect is providing early classification. Accurate and timely diagnosis is essential for allowing physicians to design appropriate therapeutic strategies at early stages of diseases, when therapies are usually the most effective and the least costly. We propose a probabilistic hybrid method that allows for early, accurate, and patient-specific classification of multivariate time series that, by training on a full time series, offer classification at a very early time point during the diagnosis phase, while staying competitive in terms of accuracy with other models that use full time series both in training and testing. The method has attained very promising results and outperformed the baseline models on a dataset of response to drug therapy in Multiple Sclerosis patients and on a sepsis therapy dataset. Although attaining accurate classification is the primary goal of data mining task, in medical applications it is important to attain decisions that are not only accurate and obtained early, but can also be easily interpreted which is the second aspect of medical applications. Physicians tend to prefer interpretable methods rather than black-box methods. For that purpose, we propose interpretable methods for early classification by extracting interpretable patterns from the raw time series to help physicians in providing early diagnosis and to gain insights into and be convinced about the classification results. The proposed methods have been shown to be more accurate and provided classifications earlier than three alternative state-of-the-art methods when evaluated on human viral infection datasets and a larger myocardial infarction dataset. The third aspect has to be considered for medical applications is the need for predictions to be accompanied by a measure which allows physicians to judge about the uncertainty or belief in the prediction. Knowing the uncertainty associated with a given prediction is especially important in clinical diagnosis where data mining methods assist clinical experts in making decisions and optimizing therapy. We propose an effective method to provide uncertainty estimate for the proposed interpretable early classification methods. The method was evaluated on four challenging medical applications by characterizing decrease in uncertainty of prediction. We showed that our proposed method meets the requirements of uncertainty estimates (the proposed uncertainty measure takes values in the range [0,1] and propagates over time). To the best of our knowledge, this PhD thesis will have a great impact on the link between data mining community and medical domain experts and would give physicians sufficient confidence to put the proposed methods into real practice.