Loading...
Thumbnail Image
Item

P-VALUE BASED VARIABLE SELECTION FOR GENERALIZED LINEAR MODELS

Research Projects
Organizational Units
Journal Issue
DOI
https://doi.org/10.34944/mard-8z16
Abstract
This thesis presents two new p-value based methods for variable selection in generalized linear models. Generalized linear models are widely used, but their non-analytic solutions and intricate dependencies create challenges for many existing methods. Addressing these issues, our proposed contributions can select important variables for generalized linear models and control the false discovery rate under arbitrary covariance structure. Both approaches ultimately make selections via the two step multiple testing procedure of Sarkar and Tang (2022), but they differ in how they construct the necessary p-values. Our first proposed method, the generalized two step, creates appropriate p-values by solving a non-trivial linear quadratic equation to construct a data-adaptive linear transformation. This transformation is then applied to an initial fitted model to create paired estimates of the unknown true coefficient vector. Our second proposed method, the weighted two step, adjusts the original response vector and design matrix to reframe generalized linear models as homoskedastic linear regressions. After reweighting, the methods of Sarkar and Tang (2022) can be more directly applied to the generalized setting. We develop mathematical theory to prove that our contributions control the false discovery rate, and empirical evaluations demonstrate their promising performance across diverse simulation settings and ten datasets.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos