Data shuffling with hierarchical tuple spaces
Abstract
Methods and systems for shuffling data to generate a dataset are described. A first map module may generate first pair data, and a second map module may generate second pair data, from source data. The first map module may insert the first pair data into a first local tuple space accessible to the first map module. The second map module may insert the second pair data into a second local tuple space accessible to the second map module. A shuffle module may request pair data that includes a particular key. The first and second pair data may be inserted into a global tuple space accessible by the first and second map modules. The shuffle module may identify the requested pair data in the global tuple space, and may fetch the identified pair data from a memory. The shuffle module may shuffle the fetched pair data to generate the dataset.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1805382
- Patent Number(s):
- 10891274
- Application Number:
- 15/851,511
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC02-05CH11231
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 12/21/2017
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Kayi, Abdullah, Andrade Costa, Carlos Henrique, Park, Yoonho, and Johns, Charles Ray. Data shuffling with hierarchical tuple spaces. United States: N. p., 2021.
Web.
Kayi, Abdullah, Andrade Costa, Carlos Henrique, Park, Yoonho, & Johns, Charles Ray. Data shuffling with hierarchical tuple spaces. United States.
Kayi, Abdullah, Andrade Costa, Carlos Henrique, Park, Yoonho, and Johns, Charles Ray. Tue .
"Data shuffling with hierarchical tuple spaces". United States. https://www.osti.gov/servlets/purl/1805382.
@article{osti_1805382,
title = {Data shuffling with hierarchical tuple spaces},
author = {Kayi, Abdullah and Andrade Costa, Carlos Henrique and Park, Yoonho and Johns, Charles Ray},
abstractNote = {Methods and systems for shuffling data to generate a dataset are described. A first map module may generate first pair data, and a second map module may generate second pair data, from source data. The first map module may insert the first pair data into a first local tuple space accessible to the first map module. The second map module may insert the second pair data into a second local tuple space accessible to the second map module. A shuffle module may request pair data that includes a particular key. The first and second pair data may be inserted into a global tuple space accessible by the first and second map modules. The shuffle module may identify the requested pair data in the global tuple space, and may fetch the identified pair data from a memory. The shuffle module may shuffle the fetched pair data to generate the dataset.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2021},
month = {1}
}
Works referenced in this record:
Computing platform based on a hierarchy of nested data structures
patent-application, January 2014
- Hamel, Greg
- US Patent Application 13/724123; 20140019491
Systems and/or Methods for Leveraging in-Memory Storage in Connection with the shuffle phase of Mapreduce
patent-application, February 2016
- Mehra, Gagan; Devgan, Manish
- US Patent Application 14/449517; 20160034205
Methods and Apparatus of Analyzing Electrical Power Grid Data
patent-application, December 2013
- Hafen, Ryan P.; Critchlow, Terence J.; Gibson, Tara D.
- US Patent Application 13/928108; 20130345999
Method and apparatus for shuffling data
patent, December 2018
- Roussel, Patrice; Macy, William W.; Nguyen, Huy V.
- US Patent Document 10,152,323
Optimization of Map-reduce Shuffle Performance Through Shuffler I/O Pipeline Actions and Planning
patent-application, May 2015
- Hu, Zhenhua; Ma, Hao Hai; Tang, Wentao
- 14/090282; 20150150017
Method and system for facilitating data retrieval from a plurality of data sources
patent-application, November 2006
- Gorelik, Alexander
- US Patent Application 11/499442; 20060271528
Data Shuffling with Hierarchical Tuple Spaces
patent-application, June 2019
- Andrade Costa, Carlos Henrique; Kayi, Abdullah; Park, Yoonho
- US Patent Application 15/851480; 20190196783
Transparent Efficiency for in-Memory Execution of Map Reduce Job Sequences
patent-application, February 2014
- Cunningham, David; Herta, Benjamin W.; Saraswat, Vijay A.
- US Patent Application 13/593718; 20140059552
Management of intermediate data spills during the shuffle phase of a map-reduce job
patent, August 2016
- Cramer, Michael; Christian, Brian P.
- US Patent Document 9,424,274
Method and apparatus for fundamental operations on token sequences: computing similarity, extracting term values, and searching efficiently
patent-application, August 2004
- Nakano, Russell T.
- US Patent Application 10/781580; 20040162827
Workload Balancing to Handle Skews for Big Data Analytics
patent-application, November 2015
- Sahu, Birendra Kumar
- US Patent Application 14/279911; 20150331724
Behavioral abstractions for debugging coordination-centric software designs
patent-application, November 2005
- Hines, Kenneth J.
- US Patent Application 11/096425; 20050246682
Method and Apparatus for Shuffling Data
patent-application, February 2011
- Macy, JR., William W.; Debes, Eric L.; Roussel, Patrice L.
- US Patent Application 12/901336; 20110029759