Show simple item record

dc.contributor.advisorVucetic, Slobodan
dc.creatorLan, Liang
dc.date.accessioned2020-10-27T15:14:03Z
dc.date.available2020-10-27T15:14:03Z
dc.date.issued2012
dc.identifier.other864886001
dc.identifier.urihttp://hdl.handle.net/20.500.12613/1685
dc.description.abstractIn my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray classification, samples belong to several predefined categories (e.g., cancer vs. control tissues) and the goal is to build a predictor that classifies a new tissue sample based on its microarray measurements. When faced with the small-sample high-dimensional microarray data, most machine learning algorithm would produce an overly complicated model that performs well on training data but poorly on new data. To reduce the risk of over-fitting, feature selection becomes an essential technique in microarray classification. However, standard feature selection algorithms are bound to underperform when the size of the microarray data is particularly small. The best remedy is to borrow strength from external microarray datasets. In this dissertation, I will present two new multi-task feature filter methods which can improve the classification performance by utilizing the external microarray data. The first method is to aggregate the feature selection results from multiple microarray classification tasks. The resulting multi-task feature selection can be shown to improve quality of the selected features and lead to higher classification accuracy. The second method jointly selects a small gene set with maximal discriminative power and minimal redundancy across multiple classification tasks by solving an objective function with integer constraints. In protein function prediction problem, gene functions are predicted from a predefined set of possible functions (e.g., the functions defined in the Gene Ontology). Gene function prediction is a complex classification problem characterized by the following aspects: (1) a single gene may have multiple functions; (2) the functions are organized in hierarchy; (3) unbalanced training data for each function (much less positive than negative examples); (4) missing class labels; (5) availability of multiple biological data sources, such as microarray data, genome sequence and protein-protein interactions. As participants in the 2011 Critical Assessment of Function Annotation (CAFA) challenge, our team achieved the highest AUC accuracy among 45 groups. In the competition, we gained by focusing on the 5-th aspect of the problem. Thus, in this dissertation, I will discuss several schemes to integrate the prediction scores from multiple data sources and show their results. Interestingly, the experimental results show that a simple averaging integration method is competitive with other state-of-the-art data integration methods. Original spatial scan algorithm is used for detection of spatial overdensities: discovery of spatial subregions with significantly higher scores according to some density measure. This algorithm is widely used in identifying cluster of disease cases (e.g., identifying environmental risk factors for child leukemia). However, the original spatial scan algorithm only works on static spatial data. In this dissertation, I will propose one possible solution for spatial scan on movement data.
dc.format.extent112 pages
dc.language.isoeng
dc.publisherTemple University. Libraries
dc.relation.ispartofTheses and Dissertations
dc.rightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available.
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectComputer Science
dc.subjectBioinformatics
dc.subjectInformation Science
dc.subjectData Mining
dc.subjectDisease Mapping
dc.subjectMachine Learning
dc.subjectMulti-task Feature Selection
dc.subjectProtein Function Prediction
dc.subjectSpatial Scan
dc.titleData Mining Algorithms for Classification of Complex Biomedical Data
dc.typeText
dc.type.genreThesis/Dissertation
dc.contributor.committeememberObradovic, Zoran
dc.contributor.committeememberLatecki, Longin
dc.contributor.committeememberDavey, Adam
dc.description.departmentComputer and Information Science
dc.relation.doihttp://dx.doi.org/10.34944/dspace/1667
dc.ada.noteFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
dc.description.degreePh.D.
refterms.dateFOA2020-10-27T15:14:03Z


Files in this item

Thumbnail
Name:
Lan_temple_0225E_10072.pdf
Size:
886.2Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record