Loading...
Thumbnail Image
Item

Better Selection of Virtual Machines for a MapReduce Environment

Blaisse, Adam Pasqua
Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2015
Group
Department
Computer and Information Science
Permanent link to this record
Research Projects
Organizational Units
Journal Issue
DOI
http://dx.doi.org/10.34944/dspace/2587
Abstract
With the increase in the availability of large sets of data, comes the need for better and more sophisticated methods of handling and processing these sets. Due to the size and complexity of theses data sets, many users have moved to using distributed systems for storage and processing. With a distributed system, there are many different things that become much more complex and many more opportunities present themselves for issues. Out of this rose the paradigm of MapReduce. The basic idea of MapReduce is to minimize the work of the programmer and remove a lot of the chances of creating an error because of the distributed computation. To do this all the work is either done in the Map Phase by the Map Tasks or in the Reduce Phase in the Reduce tasks. Communication and synchronization is taken care of by Map Reduce so that users are protected from misusing them. Users may also want to use map reduce along with cloud computing. The most common resource that is rented from Amazon EC2 is virtual machines. Amazon offers many different sizes with different types of configuration. Some machines may be more specialized to handle CPU based jobs, while others might be optimized for memory or disk based jobs. Each of these different VM's comes with varying levels of CPU cores, RAM, and Storage capacity to match it's use. Each of theses virtual machines also has its own cost per hour. This means that simply selecting the largest or strongest machine may not be the best option if one is trying to get the most for their money. The other resource that amazon offers is called elastic storage blocks. The basic idea of the the elastic storage blocks, is to offer a users more storage capacity to add to their virtual machine. Like the virtual machines, storage space is has a per hour cost that depends on the amount of space used, as well as type of storage requested. These storage volumes, once purchased, can be attached to a virtual machine and used as extended storage capacity. For this thesis, we will look at how users can best select they type of virtual machine to fit some MapReduce job.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos