Show simple item record

dc.contributor.advisorSobel, Marc J.
dc.contributor.advisorAiroldi, Edoardo
dc.creatorPerez Romo Leroux, Andres
dc.date.accessioned2021-05-24T18:56:47Z
dc.date.available2021-05-24T18:56:47Z
dc.date.issued2021
dc.identifier.urihttp://hdl.handle.net/20.500.12613/6539
dc.description.abstractOur research deals with the problem of devising models for fitting non- identical dependent Bernoulli variables and using these models to predict fu- ture Bernoulli trials.We focus on modelling and predicting random Bernoulli response variables which meet all of the following conditions: 1. Each observed as well as future response corresponds to a Bernoulli trial 2. The trials are non-identical, having possibly different probabilities of occurrence 3. The trials are mutually correlated, with an underlying complex trial cluster correlation structure. Also allowing for the possible partitioning of trials within clusters into groups. Within cluster - group level correlation is reflected in the correlation structure. 4. The probability of occurrence and correlation structure for both ob- served and future trials can depend on a set of observed covariates. A number of proposed approaches meeting some of the above conditions are present in the current literature. Our research expands on existing statistical and machine learning methods. We propose three extensions to existing models that make use of the above conditions. Each proposed method brings specific advantages for dealing with correlated binary data. The proposed models allow for within cluster trial grouping to be reflected in the correlation structure. We partition sets of trials into groups either explicitly estimated or implicitly inferred. Explicit groups arise from the determination of common covariates; inferred groups arise via imposing mixture models. The main motivation of our research is in modelling and further understanding the potential of introducing binary trial group level correlations. In a number of applications, it can be beneficial to use models that allow for these types of trial groupings, both for improved predictions and better understanding of behavior of trials. The first model extension builds on the Multivariate Probit model. This model makes use of covariates and other information from former trials to determine explicit trial groupings and predict the occurrence of future trials. We call this the Explicit Groups model. The second model extension uses mixtures of univariate Probit models. This model predicts the occurrence of current trials using estimators of pa- rameters supporting mixture models for the observed trials. We call this the Inferred Groups model. Our third methods extends on a gradient descent based boosting algorithm which allows for correlation of binary outcomes called WL2Boost. We refer to our extension of this algorithm as GWL2Boost. Bernoulli trials are divided into observed and future trials; with all trials having associated known covariate information. We apply our methodology to the problem of predicting the set and total number of passengers who will not show up on commercial flights using covariate information and past passenger data. The models and algorithms are evaluated with regards to their capac- ity to predict future Bernoulli responses. We compare the models proposed against a set of competing existing models and algorithms using available air- line passenger no-show data. We show that our proposed algorithm extension GWL2Boost outperforms top existing algorithms and models that assume in- dependence of binary outcomes in various prediction metrics.
dc.format.extent81 pages
dc.language.isoeng
dc.publisherTemple University. Libraries
dc.relation.ispartofTheses and Dissertations
dc.rightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available.
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectStatistics
dc.subjectApplied case study
dc.subjectBinary group correlation
dc.subjectCorrelated binary data
dc.subjectGradient descent boosting
dc.subjectMachine learning
dc.subjectMultivariate probit model
dc.titleModels for fitting correlated non-identical bernoulli random variables with applications to an airline data problem
dc.typeText
dc.type.genreThesis/Dissertation
dc.contributor.committeememberSobel, Marc J.
dc.contributor.committeememberAiroldi, Edoardo
dc.contributor.committeememberMcAlinn, Kenichiro
dc.contributor.committeememberSouvenir, Richard
dc.description.departmentStatistics
dc.relation.doihttp://dx.doi.org/10.34944/dspace/6521
dc.ada.noteFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
dc.description.degreePh.D.
dc.identifier.proqst14460
dc.date.updated2021-05-19T16:10:53Z
refterms.dateFOA2021-05-24T18:56:48Z
dc.identifier.filenamePerezRomoLeroux_temple_0225E_14460.pdf


Files in this item

Thumbnail
Name:
PerezRomoLeroux_temple_0225E_1 ...
Size:
363.4Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record