• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Learning from Multiple Knowledge Sources

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Zhang_temple_0225E_11302.pdf
    Size:
    1.776Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2013
    Author
    Zhang, Ping
    Advisor
    Obradovic, Zoran
    Committee member
    Vucetic, Slobodan
    Yates, Alexander
    McLaughlin Joseph P., Jr.
    Agarwal, Pankaj
    Department
    Computer and Information Science
    Subject
    Computer Science
    Information Science
    Bioinformatics
    Crowdsourcing
    Data-dependent Experts
    Data Mining
    Drug Repositioning
    Machine Learning
    Multiple Annotators
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/3923
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/3905
    Abstract
    In supervised learning, it is usually assumed that true labels are readily available from a single annotator or source. However, recent advances in corroborative technology have given rise to situations where the true label of the target is unknown. In such problems, multiple sources or annotators are often available that provide noisy labels of the targets. In these multi-annotator problems, building a classifier in the traditional single-annotator manner, without regard for the annotator properties may not be effective in general. In recent years, how to make the best use of the labeling information provided by multiple annotators to approximate the hidden true concept has drawn the attention of researchers in machine learning and data mining. In our previous work, a probabilistic method (i.e., MAP-ML algorithm) of iteratively evaluating the different annotators and giving an estimate of the hidden true labels is developed. However, the method assumes the error rate of each annotator is consistent across all the input data. This is an impractical assumption in many cases since annotator knowledge can fluctuate considerably depending on the groups of input instances. In this dissertation, one of our proposed methods, GMM-MAPML algorithm, follows MAP-ML but relaxes the data-independent assumption, i.e., we assume an annotator may not be consistently accurate across the entire feature space. GMM-MAPML uses a Gaussian mixture model (GMM) and Bayesian information criterion (BIC) to find the fittest model to approximate the distribution of the instances. Then the maximum a posterior (MAP) estimation of the hidden true labels and the maximum-likelihood (ML) estimation of quality of multiple annotators at each Gaussian component are provided alternately. Recent studies show that it is not the case that employing more annotators regardless of their expertise will result in improved highest aggregating performance. In this dissertation, we also propose a novel algorithm to integrate multiple annotators by Aggregating Experts and Filtering Novices, which we call AEFN. AEFN iteratively evaluates annotators, filters the low-quality annotators, and re-estimates the labels based only on information obtained from the good annotators. The noisy annotations we integrate are from any combination of human and previously existing machine-based classifiers, and thus AEFN can be applied to many real-world problems. Emotional speech classification, CASP9 protein disorder prediction, and biomedical text annotation experiments show a significant performance improvement of the proposed methods (i.e., GMM-MAPML and AEFN) as compared to the majority voting baseline and the previous data-independent MAP-ML method. Recent experiments include predicting novel drug indications (i.e., drug repositioning) for both approved drugs and new molecules by integrating multiple chemical, biological or phenotypic data sources.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.