• Molecular dating: theoretical and practical investigations in phylogenomics

      Kumar, Sudhir; Pond, Sergei; Hedges, S. Blair; Thorne, Jeffrey L. (Temple University. Libraries, 2019)
      Dating of sequence divergence from different species, genes, and strains is now commonplace in biological studies aimed at deciphering micro- and macro-evolutionary temporal patterns. With sequencing becoming increasingly cheaper, molecular datasets are expanding quickly in size. This expansion has necessitated the development of innovative and efficient methods to make the inference of large timetrees feasible from genome-scale datasets that routinely contain hundreds of species. In my dissertation research, I have focused on developing such methods that improved the accuracy, precision, and speed of calculations needed for divergence time inference. I have also conducted large-scale data analyses to reveal fundamental patterns of molecular evolution. The following five related projects were pursued in this dissertation. (1) Development of a machine learning method (CorrTest) for detecting the best-fit model for describing the variation of molecular evolution rates among branches and lineages for large phylogenies. Computer simulations show that the machine learning method outperforms the currently available state-of-the-art methods and is computationally efficient. (2) Development of an analytical method and a new approach to utilize probability densities as calibrations to calculate confidence intervals reliably for RelTime, a non-Bayesian method. Empirical analysis shows that RelTime produces confidence intervals that are comparable to those generated by Bayesian methods, and simulation analysis shows that RelTime confidence intervals often contain the actual values. (3) Application of CorrTest on empirical datasets reveals the extensive autocorrelation in molecular rate in nucleotide and amino acid sequence evolution in diverse taxonomic groups, suggesting that rate autocorrelation is a common phenomenon throughout the tree of life. (4) Investigation of the impact of substitution model complexity on the accuracy and precision of divergence time estimation. Analyses of large-scale empirical data show that the selection of substitution model only has a limited impact on time estimation, as the extremely simple models yield divergence time estimates and credibility intervals remarkably similar to those obtained from very complex models. (5) Inventory of non-Bayesian methods for dating species divergences, including their statistical bases, their performance of estimating divergence times, and the software packages in which they are implemented. A guide has provided for the use of non-Bayesian dating methods to produce reliable divergence times.