Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat
 

Summary: MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
jeff@google.com, sanjay@google.com
Google, Inc.
Abstract
MapReduce is a programming model and an associ-
ated implementation for processing and generating large
data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value
pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Many
real world tasks are expressible in this model, as shown
in the paper.
Programs written in this functional style are automati-
cally parallelized and executed on a large cluster of com-
modity machines. The run-time system takes care of the
details of partitioning the input data, scheduling the pro-
gram's execution across a set of machines, handling ma-
chine failures, and managing the required inter-machine
communication. This allows programmers without any

  

Source: Agrawal, Gagan - Department of Computer Science and Engineering, Ohio State University
Brown, Angela Demke - Department of Computer Science, University of Toronto
Cafarella, Michael J. - Department of Electrical Engineering and Computer Science, University of Michigan
Culpepper, J. Shane - School of Computer Science and Information Technology, RMIT University
Francalanza, Adrian - Department of Computer Science, University of Malta

 

Collections: Computer Technologies and Information Sciences; Mathematics