• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Optimizing Memory Systems for High Efficiency in Computing Clusters

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Liu_temple_0225E_15100.pdf
    Size:
    5.868Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2022
    Author
    Liu, Wenjie
    Advisor
    He, Xubin
    Kant, Krishna
    Committee member
    Ji, Bo, 1982-
    Liu, Qing, 1979-
    Zhao, Zhigen
    Department
    Computer and Information Science
    Subject
    Computer science
    Computing cluster
    DRAM
    Memory access behavior
    Memory system
    Stacked DRAM
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/8335
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/8306
    Abstract
    DRAM-based memory system suffers from increasing aggravating row buffer interference, which causes significant performance degradation and power consumption. With DRAM scaling, the overheads of row buffer interference become even worse due to higher row activation and precharge latency. Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. With the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. This dissertation lists our research results on the above three mentioned challenges in order to optimize the memory system for high efficiency in computing clusters. Details are as follows: To address low row buffer utilization caused by row buffer interference, we propose Row Buffer Cache (RBC) architecture to efficiently mitigate row buffer interference overheads. At the core of the RBC architecture, the DRAM pages with good locality are cached and escape from the row buffer interference.Such an RBC architecture significantly reduces the overheads caused by row activation and precharge, thus improves overall system performance and energy efficiency. We evaluate our RBC using SPEC CPU2006 on a DDR4 memory compared to the commodity baseline memory system along with the state-of-art methods, DICE and Bingo. Results show that RBC improves the memory performance by up to 2.24X (16.1% on average) and reduces the overall memory energy by up to 68.2% (23.6% on average) for single-core simulations. For multi-core simulations, RBC increases the performance by up to 1.55X (16.7% on average) and reduces the energy by up to 35.4% (21.3% on average). Comparing with the state-of-art methods, RBC outperforms DICE and Bingo by 8% and 5.1% on average for single-core scenario, and by 10.1% and 4.7% for multi-core scenario. To relax the straggling effect observed in clusters, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation, and propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster. To address the performance difference in the irregular application, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58.6%, 56.7%, and 31.3%, with 46.1%, 41.6%, and 19.3% on average, respectively. SM-HMS can also achieve up to 98.6% (91.9% on average) of the ideal hybrid memory system performance.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.