Loading...
Thumbnail Image
Item

SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing

Chen, Yiting
Luo, Lailong
Guo, Deke
Rottenstreich, Ori
Research Projects
Organizational Units
Journal Issue
DOI
https://doi.org/10.1109/tcc.2021.3119991
Abstract
For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.
Description
Citation
Y. Chen, L. Luo, D. Guo, O. Rottenstreich and J. Wu, "SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing," in IEEE Transactions on Cloud Computing, vol. 11, no. 1, pp. 911-926, 1 Jan.-March 2023, doi: 10.1109/TCC.2021.3119991. keywords: {Bandwidth;Task analysis;Silicon;Time factors;Wide area networks;Data analysis;Parallel processing;Wide-area data analytics;task scheduling;job response time;dynamic network;job parallelism},
Citation to related work
Institute of Electrical and Electronics Engineers (IEEE)
Has part
IEEE Transactions on Cloud Computing, Vol. 11, Iss. 1
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos