Loading...
Thumbnail Image
Item

Potts statistical models of sequence-covariation and protein evolution

Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2021
Advisor
Levy, Ronald M.
Committee member
Napolitano, Jim
Tao, R. (Rongjia)
Passos, Dario
Group
Department
Physics
Subject
Permanent link to this record
Research Projects
Organizational Units
Journal Issue
DOI
http://dx.doi.org/10.34944/dspace/7193
Abstract
Protein evolution is governed by the balance between mutation and selection, that acts on an ensemble of protein sequence variants. The acquisition of a mutation and the effect it has on a variant’s prevalence depends on the network of interactions within the protein sequence background in which the mutation is acquired, a phenomenon known as 'epistasis'. The evidences of this epistatic interaction network can be extracted from protein multiple sequence alignments (MSAs) in the form of pairwise correlations and used to build probabilistic (Potts) models of the network of interacting protein residues, which aim to describe the probabilities of observing specific states of a system. These models help quantify the connections between protein sequence, structure, and function maintained through evolution. In this dissertation, using HIV as our model system, we use the pair correlations present in patient protein MSAs to build maximum entropy Potts models, which are a generalization of the infinite-range Ising spin-glass models, in order to extract information describing the fitness landscape, structure, and the inter-dependencies of specific mutation patterns that can be involved in engendering drug resistance in HIV patients. First, using the Hamiltonian to score individual sequences, we are able to identify the mutation patterns that are responsible for promoting drug resistance and provide direct confirmations of epistasis involving many simultaneous mutations in HIV. Building on earlier work, we show that mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type virus and provide the first confirmation of entrenchment in HIV drug-target proteins. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific viral backgrounds (molecular clones) used in the laboratory. Secondly, we demonstrate that these models are able to predict higher order sequence statistics and the fitness effects of multiple simultaneous mutations. We show through statistical analyses that unambiguous signatures of epistasis can be seen only in the comparison of the Potts model predicted and actual HIV sequence "prevalences" when expressed as higher-order marginals of the sequence probability distribution. The evidence for epistasis in viruses like HIV from different experimental measures of fitness carried out in the laboratory is weak. We further show that the Potts model predictions of the fitness landscape can capture contributions from many different orthogonal features of the landscape. Third, we use reweighting techniques such as the weighted histogram analysis method, to predict the likelihoods of highly entrenching sequences for resistance mutations, which are observed in drug-experienced HIV patients, but are otherwise not observed in drug-naive HIV patients due to limited sample sizes, to be already present in the drug-naive population, which can have a significant effects in the dynamics of the acquisition of drug resistance. Lastly, we use Kinetic Monte Carlo techniques to study the kinetics of drug resistance in HIV, with a special emphasis on the temporal behavior of strongly entrenched drug-resistance mutations, that can lead to faster or slower acquisition of drug resistance.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos