• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Models for fitting correlated non-identical bernoulli random variables with applications to an airline data problem

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    PerezRomoLeroux_temple_0225E_1 ...
    Size:
    363.4Kb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2021
    Author
    Perez Romo Leroux, Andres
    Advisor
    Sobel, Marc J.
    Airoldi, Edoardo
    Committee member
    Sobel, Marc J.
    Airoldi, Edoardo
    McAlinn, Kenichiro
    Souvenir, Richard
    Department
    Statistics
    Subject
    Statistics
    Applied case study
    Binary group correlation
    Correlated binary data
    Gradient descent boosting
    Machine learning
    Multivariate probit model
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/6539
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/6521
    Abstract
    Our research deals with the problem of devising models for fitting non- identical dependent Bernoulli variables and using these models to predict fu- ture Bernoulli trials.We focus on modelling and predicting random Bernoulli response variables which meet all of the following conditions: 1. Each observed as well as future response corresponds to a Bernoulli trial 2. The trials are non-identical, having possibly different probabilities of occurrence 3. The trials are mutually correlated, with an underlying complex trial cluster correlation structure. Also allowing for the possible partitioning of trials within clusters into groups. Within cluster - group level correlation is reflected in the correlation structure. 4. The probability of occurrence and correlation structure for both ob- served and future trials can depend on a set of observed covariates. A number of proposed approaches meeting some of the above conditions are present in the current literature. Our research expands on existing statistical and machine learning methods. We propose three extensions to existing models that make use of the above conditions. Each proposed method brings specific advantages for dealing with correlated binary data. The proposed models allow for within cluster trial grouping to be reflected in the correlation structure. We partition sets of trials into groups either explicitly estimated or implicitly inferred. Explicit groups arise from the determination of common covariates; inferred groups arise via imposing mixture models. The main motivation of our research is in modelling and further understanding the potential of introducing binary trial group level correlations. In a number of applications, it can be beneficial to use models that allow for these types of trial groupings, both for improved predictions and better understanding of behavior of trials. The first model extension builds on the Multivariate Probit model. This model makes use of covariates and other information from former trials to determine explicit trial groupings and predict the occurrence of future trials. We call this the Explicit Groups model. The second model extension uses mixtures of univariate Probit models. This model predicts the occurrence of current trials using estimators of pa- rameters supporting mixture models for the observed trials. We call this the Inferred Groups model. Our third methods extends on a gradient descent based boosting algorithm which allows for correlation of binary outcomes called WL2Boost. We refer to our extension of this algorithm as GWL2Boost. Bernoulli trials are divided into observed and future trials; with all trials having associated known covariate information. We apply our methodology to the problem of predicting the set and total number of passengers who will not show up on commercial flights using covariate information and past passenger data. The models and algorithms are evaluated with regards to their capac- ity to predict future Bernoulli responses. We compare the models proposed against a set of competing existing models and algorithms using available air- line passenger no-show data. We show that our proposed algorithm extension GWL2Boost outperforms top existing algorithms and models that assume in- dependence of binary outcomes in various prediction metrics.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2022)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.