Davey, Adam; Nelson, Deborah B.; Weir, Mark P. (Temple University. Libraries, 2014)
      There is growing recognition that ignoring missing data requires stronger assumptions than addressing them. Numerous effective methods have emerged for handling missing data under so-called ignorable missing data conditions, effective treatment of non-ignorable missing data (missing not at random, MNAR) remains an open problem, typically requiring that missing data be modeled explicitly. Most previous research applies a single statistical model to the entire data set, although some previous research suggests that imputations based on a subset of similar cases may prove more effective with MNAR data. The primary aim of this study was to determine if imputing missing data by a method which utilizes only local information could provide results comparable to methods based on explicit modeling of missing data. This thesis reports the results of 4 experiments evaluating dynamic cluster-based imputation (DCI) to impute missing post-test data in simulated cluster-randomized trial data under conditions that previous research showed no standard methods to be effective. This method identifies a set of statistical proximal observations (indexed by K) and uses a subset of them (indexed by R) to perform imputations. Both parameters can be tuned to a specific problem with previous research suggesting that K=50 and R=9 was optimal for most applications. Compared to common methods for handling non-ignorable missing data (listwise deletion, LOCF, pattern-mixture modeling, Diggle-Kenward modeling), results of the experiments suggest that indicators variables for the pattern of missing observations was included in the imputation model, DCI had comparable bias and coverage rates to "best practice" methods of Diggle-Kenward and pattern mixture models, but at the cost of larger standard errors. Thus results support for the idea that imputation based on a subset of similar observations could be more accurate than imputation using all cases.