Show simple item record

dc.contributor.advisorVucetic, Slobodan
dc.creatorShapovalov, Maxim V
dc.date.accessioned2020-11-02T15:11:02Z
dc.date.available2020-11-02T15:11:02Z
dc.date.issued2019
dc.identifier.urihttp://hdl.handle.net/20.500.12613/2356
dc.description.abstractProteins are large biomolecules which are functional building blocks of living organisms. There are about 22,000 protein-coding genes in the human genome. Each gene encodes a unique protein sequence of a typical 100-1000 length which is built using a 20-letter alphabet of amino acids. Each protein folds up into a unique 3D shape that enables it to perform its function. Each protein structure consists of some number of helical segments, extended segments called sheets, and loops that connect these elements. In the last two decades, machine learning methods coupled with exponentially expanding biological knowledge databases and computational power are enabling significant progress in the field of computational biology. In this dissertation, I carry out machine learning research for three major interconnected problems to advance protein structural biology as a field. A separate chapter in this dissertation is devoted to each problem. After the three chapters I conclude this doctoral research with a summary and direction of our future work. Chapter 1 describes design, training and application of a convolutional neural network (SecNet) to achieve 84% accuracy for the 60-year-old problem of predicting protein secondary structure given a protein sequence. Our accuracy is 2-3% better than any previous result, which had only risen 5% in last 20 years. We identified the key factors for successful prediction in a detailed ablation study. A paper submitted for publication includes our secondary-structure prediction software, data set generation, and training and testing protocols [1]. Chapter 2 characterizes the design and development of a protocol for clustering of beta turns, i.e. short structural motifs responsible for U-turns in protein loops. We identified 18 turn types, 11 of which are newly described [2]. We also developed a turn library and cross-platform software for turn assignment in new structures. In Chapter 3 I build upon the results from these two problems and predict geometries in loops of unknown structure with custom Residual Neural Networks (ResNet). I demonstrate solid results on (a) locating turns and predicting 18 types and (b) prediction of backbone torsion angles in loops. Given the recent progress in machine learning, these two results provide a strong foundation for successful loop modeling and encourage us to develop a new loop structure prediction program, a critical step in protein structure prediction and modeling.
dc.format.extent164 pages
dc.language.isoeng
dc.publisherTemple University. Libraries
dc.relation.ispartofTheses and Dissertations
dc.rightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available.
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectComputer Science
dc.subjectBioinformatics
dc.subjectClustering
dc.subjectLoop Modeling
dc.subjectMachine Learning
dc.subjectNeural Network
dc.subjectProtein
dc.subjectStructural Biology
dc.titleMachine Learning Algorithms for Characterization and Prediction of Protein Structural Properties
dc.typeText
dc.type.genreThesis/Dissertation
dc.contributor.committeememberVucetic, Slobodan
dc.contributor.committeememberObradovic, Zoran
dc.contributor.committeememberZhang, Kai
dc.contributor.committeememberDunbrack, Roland L.
dc.contributor.committeememberCarnevale, Vincenzo
dc.description.departmentComputer and Information Science
dc.relation.doihttp://dx.doi.org/10.34944/dspace/2338
dc.ada.noteFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
dc.description.degreePh.D.
refterms.dateFOA2020-11-02T15:11:02Z


Files in this item

Thumbnail
Name:
Shapovalov_temple_0225E_13894.pdf
Size:
6.652Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record