Loading...
Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2025-12
Advisor
Group
Department
Computer and Information Science
Permanent link to this record
Collections
Files
Mowlaei_temple_0225E_16320.pdf
Adobe PDF, 4.74 MB
- Embargoed until 2028-02-06
Research Projects
Organizational Units
Journal Issue
DOI
Abstract
One of the enduring challenges in genomics is to connect DNA sequence to the traits we observe, from human disease risk to agricultural performance. Phenotype prediction lies at the heart of this effort, with applications in precision medicine, where it informs genetic risk models and polygenic scores; in agriculture, where it guides breeding for yield and resilience; and in evolutionary biology, where it illuminates the forces shaping diversity. Yet, phenotype prediction is fundamentally constrained by incomplete genotype data, as genotyping arrays capture only a subset of common variants while rare and structural variants remain underrepresented. Without complete genomes, predictive models cannot reach their full potential.
This thesis traces a research trajectory that began with phenotype prediction and expanded to genotype imputation as a necessary foundation. In the first project, I developed FSF-GA, a genetic algorithm–based framework for feature selection and quantitative trait prediction. FSF-GA demonstrated competitive accuracy relative to established baselines while identifying interpretable subsets of variants that shed light on the genetic architecture of complex traits. However, the limitations of incomplete genotype data motivated a turn toward imputation.
To address this, I introduced STICI, a transformer-based framework that integrates convolutional and attention mechanisms to capture both local and global patterns of linkage disequilibrium. STICI achieved strong performance across human and non-human genomes, handling not only common variants but also multi-allelic and structurally complex ones. Building further, I developed GI-Joe, a novel deep learning architecture augmented with transformer blocks and convolutional blocks for scalable genotype imputation. GI-Joe extended the context window to over one million variants, markedly improving accuracy for rare and structural variants while remaining efficient on consumer-grade hardware. This efficiency lowers barriers for high-resolution imputation by enabling in-house analysis of private datasets without reliance on costly cloud platforms.
Together, these contributions advance both phenotype prediction and genotype imputation by combining accuracy, interpretability, and scalability. By uniting artificial intelligence methods with genomic data, the frameworks developed here bring us closer to closing the loop from sequence to trait, with broad implications for population genetics, personalized medicine, agriculture, and evolutionary biology.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos
License
IN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available.
