Analyzing petabytes of data with Hadoop

Hammerbacher, Jeff

Title: Analyzing petabytes of data with Hadoop

Full Record
Other Related Research

Abstract

The open source Apache Hadoop project provides a powerful suite of tools for storing and analyzing petabytes of data using commodity hardware. After several years of production use inside of web companies like Yahoo! and Facebook and nearly a year of commercial support and development by Cloudera, the technology is spreading rapidly through other disciplines, from financial services and government to life sciences and high energy physics. The talk will motivate the design of Hadoop and discuss some key implementation details in depth. It will also cover the major subprojects in the Hadoop ecosystem, go over some example applications, highlight best practices for deploying Hadoop in your environment, discuss plans for the future of the technology, and provide pointers to the many resources available for learning more. In addition to providing more information about the Hadoop platform, a major goal of this talk is to begin a dialogue with the ATLAS research team on how the tools commonly used in their environment compare to Hadoop, and how Hadoop could improve better to serve the high energy physics community. Short Biography: Jeff Hammerbacher is Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accelmore »« less

Authors:: Hammerbacher, Jeff

Publication Date:: Fri Aug 21 00:00:00 EDT 2009

OSTI Identifier:: 1026345

Resource Type:: Multimedia

Country of Publication:: CERN

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; OPEN SOURCE; APACHE HADOOP; DATA STORAGE

Citation Formats


                    Hammerbacher, Jeff. Analyzing petabytes of data with Hadoop.  CERN: N. p., 2009. 
        Web.

Copy to clipboard


                    Hammerbacher, Jeff. Analyzing petabytes of data with Hadoop.  CERN.

Copy to clipboard


                    Hammerbacher, Jeff. Fri .  
"Analyzing petabytes of data with Hadoop".  CERN.  https://www.osti.gov/servlets/purl/1026345.

Copy to clipboard


                    
@article{osti_1026345,

  title        = {Analyzing petabytes of data with Hadoop},

  author       = {Hammerbacher, Jeff},

  abstractNote = {The open source Apache Hadoop project provides a powerful suite of tools for storing and analyzing petabytes of data using commodity hardware. After several years of production use inside of web companies like Yahoo! and Facebook and nearly a year of commercial support and development by Cloudera, the technology is spreading rapidly through other disciplines, from financial services and government to life sciences and high energy physics. The talk will motivate the design of Hadoop and discuss some key implementation details in depth. It will also cover the major subprojects in the Hadoop ecosystem, go over some example applications, highlight best practices for deploying Hadoop in your environment, discuss plans for the future of the technology, and provide pointers to the many resources available for learning more. In addition to providing more information about the Hadoop platform, a major goal of this talk is to begin a dialogue with the ATLAS research team on how the tools commonly used in their environment compare to Hadoop, and how Hadoop could improve better to serve the high energy physics community. Short Biography: Jeff Hammerbacher is Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to founding Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the applications of statistics and machine learning at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University and recently served as contributing editor to the book "Beautiful Data", published by O'Reilly in July 2009.},

  doi          = {},

  journal      = {},

  number       = ,

  volume       = ,

  place        = {CERN},

  year         = {Fri Aug 21 00:00:00 EDT 2009},

  month        = {Fri Aug 21 00:00:00 EDT 2009}

}

Copy to clipboard

Multimedia:

View Multimedia

Save / Share:

Export Metadata

Save to Playlist

Audio Content Search:

Download Parker THM Analyzer
Parker THM Analyzer

Hassan, Kazi ; Siegal, Michael ; Mowry, Curtis ; ...
- Description
This easy-to-operate, cost-effective, tabletop purge-and-trap gas chromatograph ensures safe drinking water and monitors disinfection by-product formation at water utilit ... More>>
Download AutoGrid - Turning Big Data Into Power with the Energy Data Platform and Apps
AutoGrid - Turning Big Data Into Power with the Energy Data Platform and Apps

Narayan, Amit ; Dresselhuys, Eric ; Kulp, Yann ; ...
- Description
AutoGrid personnel discuss how they are turning big data into power with the energy data platform and apps.
Download Argonne Out Loud: Computation, Big Data, and the Future of Cities
Argonne Out Loud: Computation, Big Data, and the Future of Cities

Catlett, Charlie
- Description
Charlie Catlett, a Senior Computer Scientist at Argonne and Director of the Urban Center for Computation and Data at the Computation Institute of the University of Chicag ... More>>
Download NETL's Energy Data eXchange
NETL's Energy Data eXchange
- Description
A brief tour around NETL's Energy Data Exchange site, where researchers can upload data or look at data from another researcher.
Download EERE's State & Local Energy Data Tool
EERE's State & Local Energy Data Tool

Shambarger, Erick ; DeCesaro, Jennifer
- Description
EERE's State and Local Energy Data (SLED) Tool provides basic energy market information that can help state and local governments plan and implement clean energy projects ... More>>