• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Mining Heterogeneous Electronic Health Records Data

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Bai_temple_0225E_13863.pdf
    Size:
    1.511Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2019
    Author
    Bai, Tian
    Advisor
    Vucetic, Slobodan
    Committee member
    Dragut, Eduard Constantin
    Ling, Haibin
    Egleston, Brian
    Zhou, Yan
    Department
    Computer and Information Science
    Subject
    Computer Science
    Artificial Intelligence
    Attention Model
    Deep Learning
    Distributed Representation
    Electronic Health Records
    Healthcare
    Natural Language Processing
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/722
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/704
    Abstract
    Electronic health record (EHR) systems are used by medical providers to streamline the workflow and enable sharing of patient data with different providers. Beyond that primary purpose, EHR data have been used in healthcare research for exploratory and predictive analytics. EHR data are heterogeneous collections of both structured and unstructured information. In order to store data in a structured way, several ontologies have been developed to describe diagnoses and treatments. On the other hand, the unstructured clinical notes contain various more nuanced information about patients. The multidimensionality and complexity of EHR data pose many unique challenges and problems for both data mining and medical communities. In this thesis, we address several important issues and develop novel deep learning approaches in order to extract insightful knowledge from these data. Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we show the proposed approach provide superior cross-referencing when compared to several existing approaches. Furthermore, considering EHR data also contain unstructured clinical notes, we also propose a method that jointly learns medical concept and word representations. The jointly learned representations of medical codes and words can be used to extract phenotypes of different diseases. Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. We propose a novel interpretable deep learning model, called Timeline. The main novelty of Timeline is that it has a mechanism that learns time decay factors for every medical code. We evaluated Timeline on two large-scale real world data sets. The specific task was to predict what is the primary diagnosis category for the next hospital visit given previous visits. Our results show that Timeline has higher accuracy than the state of the art deep learning models based on RNN. Clinical notes contain detailed information about health status of patients for each of their encounters with a health system. Developing effective models to automatically assign medical codes to clinical notes has been a long-standing active research area. Considering the large amount of online disease knowledge sources, which contain detailed information about signs and symptoms of different diseases, their risk factors, and epidemiology, we consider Wikipedia as an external knowledge source and propose Knowledge Source Integration (KSI), a novel end-to-end code assignment framework, which can integrate external knowledge during training of any baseline deep learning model. To evaluate KSI, we experimented with automatic assignment of ICD-9 diagnosis codes to clinical notes, aided by Wikipedia documents corresponding to the ICD-9 codes. The results show that KSI consistently improves the baseline models and that it is particularly successful in rare codes prediction.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.