• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Stateless Parallel Processing Architecture for Extreme Scale HPC and Auction-based Clouds

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    TETDEDXTaifi-temple-0225E-11609.pdf
    Size:
    3.964Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2013
    Author
    Taifi, Moussa
    Advisor
    Shi, Justin Y.
    Committee member
    Wu, Jie, 1961-
    Tan, Chiu C.
    Khreishah, Abdallah
    Szymanski, Boleslaw
    Department
    Computer and Information Science
    Subject
    Computer Science
    Fault Tolerance
    Performance of Systems
    Scalability
    Sustainable Extreme Scale Hpc Architecture
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/3629
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/3611
    Abstract
    Extreme scale HPC (high performance computing) applications require massively many nodes. At these scales, transient hardware and software failures, as well as network congestion and disconnections increase linearly with the number of components. This volatility contributed to the dramatic decrease in applications' MTBF (mean time between failures). Traditional point-to-point transmission APIs semantics are ill-fitted to support applications of extreme scale. In this thesis, we investigate an application dependent network design that focuses on the sustainability of extreme scale high performance computing applications using packet-switching-inspired statistical multiplexing of semantic data tuples and decoupled computations. We report the design and implementation of a distributed tuple space using Cassandra and Zookeeper for tunable spatial and temporal redundancies without negative impact on application performance. We detail the various failure scenarios that can be handled seamlessly by our system and provide a description of the advantages of Stateless Parallel Processing for HPC applications. We report our results on performance, reliability and overall application sustainability. In the preliminary tests, for the most common HPC application categories, the prototype has demonstrated sustained performance, while providing a reliable computing architecture that can withstand multiple failure types without manual checkpoint-restart(CPR). The feasibility of efficient non-stop HPC enables aution-based cloud for more cost efficient HPC applications. For all HPC application categories, we first report a novel method for determining bid-aware checkpointing intervals using fluctuating cloud providers' pricing histories. Subsequently, we explore the effects of bidding in the case of virtual HPC clusters composed of EC2 Spot Instances. We expose the counter-intuitive effects of uniform versus non-uniform bidding, especially in terms of failure rate and failure model, and we propose a method to deal with the problem of predicting the runtime of parallel applications under various bidding strategies. We then show that CPR-free HPC applications require a new optimization strategy. As extreme scale HPC and auction-based cloud computing offer the ultimate computational scale and resource efficiency, they challenge the very foundations in computer science research and development. This thesis answers some critical questions about these challenges and we hope to pave the way for future improvements of the HPC field under increasingly harsh and volatile conditions.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.