Show simple item record

dc.contributor.advisorObradovic, Zoran
dc.creatorStanojevic, Marija
dc.date.accessioned2023-05-22T19:54:26Z
dc.date.available2023-05-22T19:54:26Z
dc.date.issued2023
dc.identifier.urihttp://hdl.handle.net/20.500.12613/8499
dc.description.abstractIn the last decade, machine learning models have increased in size and amount of data they are using, which has led to improved performance on many tasks. Most notably, there has been a significant development in end-to-end deep learning and reinforcement learning models with new learning algorithms and architectures proposed frequently. Furthermore, while previous methods were focused on supervised learning, in the last five years, many models were proposed that learn in semi-supervised or self-supervised ways. The model is then fine-tuned to a specific task or different data domain. Adapting machine learning models learned on one type of data to similar but different data is called domain adaptation. This thesis discusses various challenges in the domain adaptation of machine learning models to specific tasks and real-world applications and proposes solutions for those challenges. Data in real-world applications have different properties than clean machine-learning datasets commonly used for the experimental evaluation of proposed models. Learning appropriate representations from high-dimensional complex data with internal dependencies is arduous due to the curse of dimensionality and spurious correlation. However, most real-world data have these properties in addition to a small number of labeled samples since labeling is expensive and tedious. Additionally, accuracy drops drastically if models are applied to domain-specific datasets and unbalanced problems. Moreover, state-of-the-art models are not able to handle missing data. In this thesis, I strive to create frameworks that can learn a good representation of high-dimensional small data with correlations between variables. The first chapter of this thesis describes the motivation, background, and research objectives. It also gives an overview of contributions and publications. A background needed to understand this thesis is provided in the second chapter and an introduction to domain adaptation is described in chapter three. The fourth chapter discusses domain adaptation with small target data. It describes the algorithm for semi-supervised learning over domain-specific short texts such as reviews or tweets. The proposed framework achieves up to 12.6% improvement when only 5000 labeled examples are available. The fifth chapter explores the influence of unanticipated bias in fine-tuning data. This chapter outlines how the bias in news data influences the classification performance of domain-specific text, where the domain is U.S. politics. It is shown that fine-tuning with domain-specific data is not always beneficial, especially if bias towards one label is present. The sixth chapter examines domain adaptation on datasets with high missing rates. It reviews a system created to learn from high-dimensional small data from psychological studies, which have up to 70% missingness. The proposed framework is achieving 9.3% smaller imputation and 33% lower prediction errors. The seventh chapter discusses the curse of dimensionality problem in domain adaptation. It presents a methodology for discovering research articles containing evolutionary timetrees. That system can search for, download, and filter research articles in which timetrees are imported. It scans 5 million articles in a few days. The proposed method also decreases the error of finding research papers by 21% compared to the baseline, which cannot work with high-dimensional data properly. The last, eighth chapter, summarizes the findings of this thesis and suggests future prospects.
dc.language.isoeng
dc.publisherTemple University. Libraries
dc.relation.ispartofTheses and Dissertations
dc.rightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available.
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectInformation science
dc.subjectArtificial intelligence
dc.subjectBiased data
dc.subjectDomain adaptation
dc.subjectMachine learning
dc.subjectMissing data
dc.subjectOut-of-domain data
dc.subjectSequential representation learning
dc.titleDomain Adaptation Applications to Complex High-dimensional Target Data
dc.typeText
dc.type.genreThesis/Dissertation
dc.contributor.committeememberDragut, Eduard Constantin
dc.contributor.committeememberVucetic, Slobodan
dc.contributor.committeememberKumar, Sudhir
dc.description.departmentComputer and Information Science
dc.relation.doihttp://dx.doi.org/10.34944/dspace/8463
dc.ada.noteFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
dc.description.degreePh.D.
dc.identifier.proqst15229
dc.creator.orcid0000-0001-8227-6577
dc.date.updated2023-05-19T01:08:06Z
refterms.dateFOA2023-05-22T19:54:27Z
dc.identifier.filenameStanojevic_temple_0225E_15229.pdf


Files in this item

Thumbnail
Name:
Stanojevic_temple_0225E_15229.pdf
Size:
6.340Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record