Show simple item record

dc.contributor.advisorGuo, Yuhong
dc.creatorXIAO, MIN
dc.date.accessioned2020-11-05T16:15:47Z
dc.date.available2020-11-05T16:15:47Z
dc.date.issued2016
dc.identifier.other965642627
dc.identifier.urihttp://hdl.handle.net/20.500.12613/3862
dc.description.abstractSequence labeling tasks have been widely studied in the natural language processing area, such as part-of-speech tagging, syntactic chunking, dependency parsing, and etc. Most of those systems are developed on a large amount of labeled training data via supervised learning. However, manually collecting labeled training data is too time-consuming and expensive. As an alternative, to alleviate the issue of label scarcity, domain adaptation has recently been proposed to train a statistical machine learning model in a target domain where there is no enough labeled training data by exploiting existing free labeled training data in a different but related source domain. The natural language processing community has witnessed the success of domain adaptation in a variety of sequence labeling tasks. Though the labeled training data in the source domain are available and free, however, they are not exactly as and can be very different from the test data in the target domain. Thus, simply applying naive supervised machine learning algorithms without considering domain differences may not fulfill the purpose. In this dissertation, we developed several novel representation learning approaches to address domain adaptation for sequence labeling in natural language processing. Those representation learning techniques aim to induce latent generalizable features to bridge domain divergence to enable cross-domain prediction. We first tackle a semi-supervised domain adaptation scenario where the target domain has a small amount of labeled training data and propose a distributed representation learning approach based on a probabilistic neural language model. We then relax the assumption of the availability of labeled training data in the target domain and study an unsupervised domain adaptation scenario where the target domain has only unlabeled training data, and give a task-informative representation learning approach based on dynamic dependency networks. Both works are developed in the setting where different domains contain sentences in different genres. We then extend and generalize domain adaptation into a more challenging scenario where different domains contain sentences in different languages and propose two cross-lingual representation learning approaches, one is based on deep neural networks with auxiliary bilingual word pairs and the other is based on annotation projection with auxiliary parallel sentences. All four specific learning scenarios are extensively evaluated with different sequence labeling tasks. The empirical results demonstrate the effectiveness of those generalized domain adaptation techniques for sequence labeling in natural language processing.
dc.format.extent100 pages
dc.language.isoeng
dc.publisherTemple University. Libraries
dc.relation.ispartofTheses and Dissertations
dc.rightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available.
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectComputer Science
dc.titleGeneralized Domain Adaptation for Sequence Labeling in Natural Language Processing
dc.typeText
dc.type.genreThesis/Dissertation
dc.contributor.committeememberGuo, Yuhong
dc.contributor.committeememberObradovic, Zoran
dc.contributor.committeememberVucetic, Slobodan
dc.contributor.committeememberChen, Sining, 1978-
dc.description.departmentComputer and Information Science
dc.relation.doihttp://dx.doi.org/10.34944/dspace/3844
dc.ada.noteFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
dc.description.degreePh.D.
refterms.dateFOA2020-11-05T16:15:47Z


Files in this item

Thumbnail
Name:
XIAO_temple_0225E_12385.pdf
Size:
765.7Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record