Intelligent Methods for Data Management: Prediction, Migration, and Generation

Pang, Lu

Item

Intelligent Methods for Data Management: Prediction, Migration, and Generation

Pang, Lu

Genre

Thesis/Dissertation

Date

2023

Advisor

Kant, Krishna

Committee member

Tan, Chiu C.
Vucetic, Slobodan
Biswas, Saroj K.

Department

Computer and Information Science

DOI

http://dx.doi.org/10.34944/dspace/8532

Abstract

Modern storage systems need to deal with an increasing amount of data volume, however, only a very small fraction of the data is needed by the currently running applications. The growing availability of low-cost high-capacity storage devices and the propensity to store data makes it harder to separate the data needed from the cold data. Research into the management of data will require a lot of investment, to ensure that the stored data provides value and is not an impediment, due to the sheer volume of data. My research work develops and leverages emerging intelligent methods to tackle the problem of data management in storage systems. In this dissertation, we first develop methods to better predict and understand workloads. The first method we develop allows applications to predict the frequency of the request for data elements, the “heat” of the data. This would allow storage systems to better manage data by feeding it with an accurate prediction of how many disk operations are expected on that data. The method is a proactive heat prediction method that works efficiently in an online setting. The way the method works is that it divides the workload into similar behaving groups to be able to generate per group models. Some unique features of our approach are that it provides a range for the expected number of requests and allows the prediction to occur several time steps into the future at the cost of accuracy. As part of the research into understanding workloads, we develop a method to characterize and identify the workloads of HPC applications. The method is a deep learning model that is designed to identify workload changes in the IO stream as they occur. The research discussed thus far gives us an understanding of workload characteristics and provides reasonable approaches to heat prediction. To allow us to manage the storage hierarchy efficiently we develop methods to migrate data based on expected data usage level. In multi-tiered storage systems, data is migrated between lower and higher tiers to reduce data access latency. We develop an Adaptive Intelligent Tiering (AIT) mechanism that makes tiering decisions based on accesses to the data elements. The AIT mechanism generates a set of candidate movements and uses a control mechanism to further refine the candidates identified for data migration. The previous methods and storage system methods in general are sensitive to workload characteristics. Identifying what makes a workload similar to another is not a straight-forward problem. We set out to define a three-part measure called Similarity Index for Storage Traffic (SIST). Such a measure is essential for evaluating the similarity of traces. This can also be used to assess the quality of synthetic traces and ascertain whether these traces are similar enough to the original traces to be used in storage-based applications. Finally, we address one of the biggest challenges in storage system research. The lack of large amounts high quality storage traces makes it hard to benchmark and compare storage system methods. To tackle this problem, we develop a method that can generate realistic data requests that can be used independently or to augment the existing datasets. Since real data traces are difficult to obtain, a realistic data generation method will help alleviate the difficulty in researching data management issues. As part of this research, we explore generative models and adversarial methods. Part of the challenge of this work is handling the sparse and temporal nature of data requests in storage systems. We develop a deep learning model to generate realistic storage trace samples.

ADA compliance

For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu

Intelligent Methods for Data Management: Prediction, Migration, and Generation

Pang, Lu

Citations

Genre

Date

Advisor

Committee member

Group

Department

Subject

Permanent link to this record

Collections

Files

Research Projects

Organizational Units

Journal Issue

DOI

Abstract

Description

Citation

Citation to related work

Has part

ADA compliance

Embedded videos