• Model-Free Variable Selection For Two Groups of Variables

      Dong, Yuexiao; Tang, Cheng Yong; Chitturi, Pallavi; Shen, Cencheng (Temple University. Libraries, 2018)
      In this dissertation we introduce two variable selection procedures for multivariate responses. Our procedures are based on sufficient dimension reduction concepts and are model-free. In the first procedure we consider the dual marginal coordinate hypotheses, where the role of the predictor and the response is not important. Motivated by canonical correlation analysis (CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses, and devise a joint backward selection algorithm for dual model-free variable selection. The second procedure is based on ordinary least squares (OLS). We derive and study the asymptotic properties of the OLS-based test under the normality assumption of the predictors as well as an asymmetry assumption. When these assumptions are violated, the asymptotic test with elliptical trimming and clustering is still valid with desirable numerical performances. A backward selection algorithm for the predictor is also provided for the OLS-based test. The performances of the proposed tests and the variable selection procedures are evaluated through synthetic examples and a real data analysis.
    • Multilevel Model Selection: A Regularization Approach Incorporating Heredity Constraints

      Izenman, Alan Julian; Heiberger, Richard M., 1945-; Zhao, Zhigen; Mennis, Jeremy (Temple University. Libraries, 2013)
      This dissertation focuses on estimation and selection methods for a simple linear model with two levels of variation. This model provides a foundation for extensions to more levels. We propose new regularization criteria for model selection, subset selection, and variable selection in this context. Regularization is a penalized-estimation approach that shrinks the estimate and selects variables for structured data. This dissertation introduces a procedure (HM-ALASSO) that extends regularized multilevel-model estimation and selection to enforce principles of fixed heredity (e.g., including main effects when their interactions are included) and random heredity (e.g., including fixed effects when their random terms are included). The goals in developing this method were to create a procedure that provided reasonable estimates of all parameters, adhered to fixed and random heredity principles, resulted in a parsimonious model, was theoretically justifiable, and was able to be implemented and used in available software. The HM-ALASSO incorporates heredity-constrained selection directly into the estimation process. HM-ALASSO is shown to enjoy the properties of consistency, sparsity, and asymptotic normality. The ability of HM-ALASSO to produce quality estimates of the underlying parameters while adhering to heredity principles is demonstrated using simulated data. The performance of HM-ALASSO is illustrated using a subset of the High School and Beyond (HS&B) data set that includes math-achievement outcomes modeled via student- and school-level predictors. The HM-ALASSO framework is flexible enough that it can be adapted for various rule sets and parameterizations.
    • Variable Selection and Supervised Dimension Reduction for Large-Scale Genomic Data with Censored Survival Outcomes

      Tang, Cheng-Yong; Devarajan, Karthik; Chitturi, Pallavi; Dong, Yuexiao; Obradovic, Zoran (Temple University. Libraries, 2017)
      One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes, providing insight into the disease's process. With the rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of thousands of genes and proteins resulting in enormous data sets where the number of genomic variables (covariates) is far greater than the number of subjects. It is also typical for such data sets to have a high proportion of censored observations. Methods based on univariate Cox regression are often used to select genes related to survival outcome. However, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each gene. When applied to genes exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. In this thesis, we develop methods that will directly address t