On Leveraging Representation Learning Techniques for Data Analytics in Biomedical Informatics
AuthorCao, Xi Hang
Committee memberVucetic, Slobodan
Souvenir, Richard M.
DepartmentComputer and Information Science
Permanent link to this recordhttp://hdl.handle.net/20.500.12613/903
MetadataShow full item record
AbstractRepresentation Learning is ubiquitous in state-of-the-art machine learning workflow, including data exploration/visualization, data preprocessing, data model learning, and model interpretations. However, the majority of the newly proposed Representation Learning methods are more suitable for problems with a large amount of data. Applying these methods to problems with a limited amount of data may lead to unsatisfactory performance. Therefore, there is a need for developing Representation Learning methods which are tailored for problems with ``small data", such as, clinical and biomedical data analytics. In this dissertation, we describe our studies of tackling the challenging clinical and biomedical data analytics problem from four perspectives: data preprocessing, temporal data representation learning, output representation learning, and joint input-output representation learning. Data scaling is an important component in data preprocessing. The objective in data scaling is to scale/transform the raw features into reasonable ranges such that each feature of an instance will be equally exploited by the machine learning model. For example, in a credit flaw detection task, a machine learning model may utilize a person's credit score and annual income as features, but because the ranges of these two features are different, a machine learning model may consider one more heavily than another. In this dissertation, I thoroughly introduce the problem in data scaling and describe an approach for data scaling which can intrinsically handle the outlier problem and lead to better model prediction performance. Learning new representations for data in the unstandardized form is a common task in data analytics and data science applications. Usually, data come in a tubular form, namely, the data is represented by a table in which each row is a feature (row) vector of an instance. However, it is also common that the data are not in this form; for example, texts, images, and video/audio records. In this dissertation, I describe the challenge of analyzing imperfect multivariate time series data in healthcare and biomedical research and show that the proposed method can learn a powerful representation to encounter various imperfections and lead to an improvement of prediction performance. Learning output representations is a new aspect of Representation Learning, and its applications have shown promising results in complex tasks, including computer vision and recommendation systems. The main objective of an output representation algorithm is to explore the relationship among the target variables, such that a prediction model can efficiently exploit the similarities and potentially improve prediction performance. In this dissertation, I describe a learning framework which incorporates output representation learning to time-to-event estimation. Particularly, the approach learns the model parameters and time vectors simultaneously. Experimental results do not only show the effectiveness of this approach but also show the interpretability of this approach from the visualizations of the time vectors in 2-D space. Learning the input (feature) representation, output representation, and predictive modeling are closely related to each other. Therefore, it is a very natural extension of the state-of-the-art by considering them together in a joint framework. In this dissertation, I describe a large-margin ranking-based learning framework for time-to-event estimation with joint input embedding learning, output embedding learning, and model parameter learning. In the framework, I cast the functional learning problem to a kernel learning problem, and by adopting the theories in Multiple Kernel Learning, I propose an efficient optimization algorithm. Empirical results also show its effectiveness on several benchmark datasets.
ADA complianceFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact email@example.com
Showing items related by title, author, creator and subject.
Context-aware Learning from Partial ObservationsObradovic, Zoran; Vucetic, Slobodan; Dragut, Eduard Constantin; Zhao, Zhigen (Temple University. Libraries, 2018)The Big Data revolution brought an increasing availability of data sets of unprecedented scales, enabling researchers in machine learning and data mining communities to escalate in learning from such data and providing data-driven insights, decisions, and predictions. However, on their journey, they are faced with numerous challenges, including dealing with missing observations while learning from such data or making predictions on previously unobserved or rare (“tail”) examples, which are present in a large span of domains including climate, medical, social networks, consumer, or computational advertising domains. In this thesis, we address this important problem and propose tools for handling partially observed or completely unobserved data by exploiting information from its context. Here, we assume that the context is available in the form of a network or sequence structure, or as additional information to point-informative data examples. First, we propose two structured regression methods for dealing with missing values in partially observed temporal attributed graphs, based on the Gaussian Conditional Random Fields (GCRF) model, which draw power from the network/graph structure (context) of the unobserved instances. Marginalized Gaussian Conditional Random Fields (m-GCRF) model is designed for dealing with missing response variable value (labels) in graph nodes, whereas Deep Feature Learning GCRF is able to deal with missing values in explanatory variables while learning feature representation jointly with learning complex interactions of nodes in a graph and together with the overall GCRF objective. Next, we consider unsupervised and supervised shallow and deep neural models for monetizing web search. We focus on two sponsored search tasks here: (i) query-to-ad matching, where we propose novel shallow neural embedding model worLd2vec with improved local query context (location) utilization and (ii) click-through-rate prediction for ads and queries, where Deeply Supervised Semantic Match model is introduced for dealing with unobserved and tail queries click-through-rate prediction problem, while jointly learning the semantic embeddings of a query and an ad, as well as their corresponding click-through-rate. Finally, we propose a deep learning approach for ranking investigators based on their expected enrollment performance on new clinical trials, that learns from both, investigator and trial-related heterogeneous (structured and free-text) data sources, and is applicable to matching investigators to new trials from partial observations, and for recruitment of experienced investigators, as well as new investigators with no previous experience in enrolling patients in clinical trials. Experimental evaluation of the proposed methods on a number of synthetic and diverse real-world data sets shows surpassing performance over their alternatives.
Brain computer interface learning for systems based on electrocorticography and intracortical microelectrode arraysHiremath, SV; Chen, W; Wang, W; Foldes, S; Yang, Y; Tyler-Kabara, EC; Collinger, JL; Boninger, ML (2015-06-10)© 2015 Hiremath, Chen, Wang, Foldes, Yang, Tyler-Kabara, Collinger and Boninger. A brain-computer interface (BCI) system transforms neural activity into control signals for external devices in real time. A BCI user needs to learn to generate specific cortical activity patterns to control external devices effectively. We call this process BCI learning, and it often requires significant effort and time. Therefore, it is important to study this process and develop novel and efficient approaches to accelerate BCI learning. This article reviews major approaches that have been used for BCI learning, including computer-assisted learning, co-adaptive learning, operant conditioning, and sensory feedback. We focus on BCIs based on electrocorticography and intracortical microelectrode arrays for restoring motor function. This article also explores the possibility of brain modulation techniques in promoting BCI learning, such as electrical cortical stimulation, transcranial magnetic stimulation, and optogenetics. Furthermore, as proposed by recent BCI studies, we suggest that BCI learning is in many ways analogous to motor and cognitive skill learning, and therefore skill learning should be a useful metaphor to model BCI learning.
Multi-label Learning under Different Labeling ScenariosGuo, Yuhong; Vucetic, Slobodan; Dragut, Eduard Constantin; Dong, Yuexiao (Temple University. Libraries, 2015)Traditional multi-class classification problems assume that each instance is associated with a single label from category set Y where |Y| > 2. Multi-label classification generalizes multi-class classification by allowing each instance to be associated with multiple labels from Y. In many real world data analysis problems, data objects can be assigned into multiple categories and hence produce multi-label classification problems. For example, an image for object categorization can be labeled as 'desk' and 'chair' simultaneously if it contains both objects. A news article talking about the effect of Olympic games on tourism industry might belong to multiple categories such as 'sports', 'economy', and 'travel', since it may cover multiple topics. Regardless of the approach used, multi-label learning in general requires a sufficient amount of labeled data to recover high quality classification models. However due to the label sparsity, i.e. each instance only carries a small number of labels among the label set Y, it is difficult to prepare sufficient well-labeled data for each class. Many approaches have been developed in the literature to overcome such challenge by exploiting label correlation or label dependency. In this dissertation, we propose a probabilistic model to capture the pairwise interaction between labels so as to alleviate the label sparsity. Besides of the traditional setting that assumes training data is fully labeled, we also study multi-label learning under other scenarios. For instance, training data can be unreliable due to missing values. A conditional Restricted Boltzmann Machine (CRBM) is proposed to take care of such challenge. Furthermore, labeled training data can be very scarce due to the cost of labeling but unlabeled data are redundant. We proposed two novel multi-label learning algorithms under active setting to relieve the pain, one for standard single level problem and one for hierarchical problem. Our empirical results on multiple multi-label data sets demonstrate the efficacy of the proposed methods.