Loading...
Thumbnail Image
Item

Achieving End-to-End Quality of Service for Remote Storage and Memory Traffic in Datacenters

Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2023-11
Group
Department
Computer and Information Science
Permanent link to this record
Research Projects
Organizational Units
Journal Issue
DOI
http://dx.doi.org/10.34944/dspace/9466
Abstract
Storage servers in the datacenter today comprise low latency and high throughput devices ex. Solid State Drives (SSDs) and Persistent Memory (PM). These can be accessed over the network via storage access protocols such as NVMe-oF (Non-Volatile Memory Express over Fabric) and PM-oF (Persistent Memory over Fabric). Applications accessing these devices may have varying End-to-End Quality of Service (E2E QoS) requirements concerning their carried throughput and latency. Providing differentiated treatment based on these requirements is a challenging task during congestion episodes that may occur at various points in the E2E path, including requests from the host to the storage (via network) and the response traveling back to the host. In this work we explore and propose new techniques that can be utilized to provide E2E QoS differentiation for requests at four major points: (1) the host end from where the requests originate and back to where the responses reach, (2) the request and the response path, (3) the storage access protocol, i.e., NVMe and (4) the storage device. For the purposes of E2E QoS, we classify requests as either high priority or normal priority. In each category there could be multiple classes that are distinguished by a differentiated ratio of their target throughputs or latencies. For (1), caching in host memory is a technique that can improve the performance of specific QoS classes at the host end. We hence propose a new caching algorithm – FussyCache, that considers the low latency of today’s storage devices to make caching decisions. To achieve differentiation in (2), we propose two new mechanisms called QTCP and QRDMA by modifying the existing congestion control mechanisms in Data Center transport protocols (i.e., DCTCP and DCQCN) to accommodate the notion of QoS. We utilize a ratio between target throughput and measured throughput (or measured latency and target latency) to provide differentiation between the multiple sub-classes within a class, while we propose reservation-based prioritized treatment for the high priority traffic. For (3), the NVMe protocol uses the Weighted Round Robin queue arbitration mechanism to provide QoS differentiation to the 4 QoS classes. We utilize this existing technique and extend it to queues inside the device (i.e., the SSD) to tackle non-deterministic access latencies caused due to the internal architecture of SSDs. The NVMe protocol also contains a feature for tackling this issue called the Predictable Latency Mode (PLM). We introduce 2 new mechanisms to tackle the deficiencies of this feature. We first introduce – PLMC, a coordinator module to arbitrate the use of PLM based on QoS classes. Most current SSDs do not have the PLM feature. To address this, we also propose PLMLight, a PLMC-like solution that segregates writes to reduce the possibility of background activity. Finally, we explore the need for consistent treatment at (1), (2), (3) and (4) based on an application’s QoS class, by proposing a request tagging mechanism. We extensively evaluate a combination of our proposed techniques using a variety of workloads to show E2E QoS differentiation. To carry out these evaluations, we built NeSt - a QoS Differentiating E2E Networked Storage Simulator, which simulates a datacenter environment with tens of SSDs along with PMs that can be accessed over the network via switches. We plan to open source this tool to enable further research in E2E QoS for Datacenter Storage and Memory traffic by the research community.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos