• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Intelligent Methods for Data Management: Prediction, Migration, and Generation

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Pang_temple_0225E_15311.pdf
    Embargo:
    2025-05-18
    Size:
    6.264Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2023
    Author
    Pang, Lu
    Advisor
    Kant, Krishna
    Committee member
    Tan, Chiu C
    Vucetic, Slobodan
    Biswas, Saroj
    Department
    Computer and Information Science
    Subject
    Computer science
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/8568
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/8532
    Abstract
    Modern storage systems need to deal with an increasing amount of data volume, however, only a very small fraction of the data is needed by the currently running applications. The growing availability of low-cost high-capacity storage devices and the propensity to store data makes it harder to separate the data needed from the cold data. Research into the management of data will require a lot of investment, to ensure that the stored data provides value and is not an impediment, due to the sheer volume of data. My research work develops and leverages emerging intelligent methods to tackle the problem of data management in storage systems. In this dissertation, we first develop methods to better predict and understand workloads. The first method we develop allows applications to predict the frequency of the request for data elements, the “heat” of the data. This would allow storage systems to better manage data by feeding it with an accurate prediction of how many disk operations are expected on that data. The method is a proactive heat prediction method that works efficiently in an online setting. The way the method works is that it divides the workload into similar behaving groups to be able to generate per group models. Some unique features of our approach are that it provides a range for the expected number of requests and allows the prediction to occur several time steps into the future at the cost of accuracy. As part of the research into understanding workloads, we develop a method to characterize and identify the workloads of HPC applications. The method is a deep learning model that is designed to identify workload changes in the IO stream as they occur. The research discussed thus far gives us an understanding of workload characteristics and provides reasonable approaches to heat prediction. To allow us to manage the storage hierarchy efficiently we develop methods to migrate data based on expected data usage level. In multi-tiered storage systems, data is migrated between lower and higher tiers to reduce data access latency. We develop an Adaptive Intelligent Tiering (AIT) mechanism that makes tiering decisions based on accesses to the data elements. The AIT mechanism generates a set of candidate movements and uses a control mechanism to further refine the candidates identified for data migration. The previous methods and storage system methods in general are sensitive to workload characteristics. Identifying what makes a workload similar to another is not a straight-forward problem. We set out to define a three-part measure called Similarity Index for Storage Traffic (SIST). Such a measure is essential for evaluating the similarity of traces. This can also be used to assess the quality of synthetic traces and ascertain whether these traces are similar enough to the original traces to be used in storage-based applications. Finally, we address one of the biggest challenges in storage system research. The lack of large amounts high quality storage traces makes it hard to benchmark and compare storage system methods. To tackle this problem, we develop a method that can generate realistic data requests that can be used independently or to augment the existing datasets. Since real data traces are difficult to obtain, a realistic data generation method will help alleviate the difficulty in researching data management issues. As part of this research, we explore generative models and adversarial methods. Part of the challenge of this work is handling the sparse and temporal nature of data requests in storage systems. We develop a deep learning model to generate realistic storage trace samples.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.