• Learning from Structured Data: Scalability, Stability and Temporal Awareness

      Obradovic, Zoran; Vucetic, Slobodan; Zhang, Kai; Airoldi, Edoardo (Temple University. Libraries, 2021)
      A plethora of high-impact applications involve predictive modeling of structured data. In various domains, from hospital readmission prediction in the medical realm, though weather forecasting and event detection in power systems, up to conversion prediction in online businesses, the data holds a certain underlying structure. Building predictive models from such data calls for leveraging the structure as an additional source of information. Thus, a broad range of structure-aware approaches have been introduced, yet certain common challenges in many structured learning scenarios remain unresolved. This dissertation revolves around addressing the challenges of scalability, algorithmic stability and temporal awareness in several scenarios of learning from either graphically or sequentially structured data. Initially, the first two challenges are discussed from a structured regression standpoint. The studies addressing these challenges aim at designing scalable and algorithmically stable models for structured data, without compromising their prediction performance. It is further inspected whether such models can be applied to both static and dynamic (time-varying) graph data. To that end, a structured ensemble model is proposed to scale with the size of temporal graphs, while making stable and reliable yet accurate predictions on a real-world application involving gene expression prediction. In the case of static graphs, a theoretical insight is provided on the relationship between algorithmic stability and generalization in a structured regression setting. A stability-based objective function is designed to indirectly control the stability of a collaborative ensemble regressor, yielding generalization performance improvements on structured regression applications as diverse as predicting housing prices based on real-estate transactions and readmission prediction from hospital records. Modeling data that holds a sequential rather than a graphical structure requires addressing temporal awareness as one of the major challenges. In that regard, a model is proposed to generate time-aware representations of user activity sequences, intended to be seamlessly applicable across different user-related tasks, while sidestepping the burden of task-driven feature engineering. The quality and effectiveness of the time-aware user representations led to predictive performance improvements over state-of-the-art models on multiple large-scale conversion prediction tasks. Sequential data is also analyzed from the perspective of a high-impact application in the realm of power systems. Namely, detecting and further classifying disturbance events, as an important aspect of risk mitigation in power systems, is typically centered on the challenges of capturing structural characteristics in sequential synchrophasor recordings. Therefore, a thorough comparative analysis was conducted by assessing various traditional as well as more sophisticated event classification models under different domain-expert-assisted labeling scenarios. The experimental findings provide evidence that hierarchical convolutional neural networks (HCNNs), capable of automatically learning time-invariant feature transformations that preserve the structural characteristics of the synchrophasor signals, consistently outperform traditional model variants. Their performance is observed to further improve as more data are inspected by a domain expert, while smaller fractions of solely expert-inspected signals are already sufficient for HCNNs to achieve satisfactory event classification accuracy. Finally, insights into the impact of the domain expertise on the downstream classification performance are also discussed.