skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer

Abstract

Big data has emerged as a driving force for scientific discoveries. Large scientific instruments (e.g., colliders, and telescopes) generate exponentially increasing volumes of data. To enable scientific discovery, science data must be collected, indexed, archived, shared, and analyzed, typically in a widely distributed and highly collaborative manner. Data transfer is now an essential function for science discoveries, particularly within big data environments. To support data transfer for big data science, there is a need for predictable, high performance, scalable, end-to-end, programmable networks that enable science applications to use network resources most efficiently. Software-Defined Networking (SDN) is a novel network architecture and technological approach that separates the network control and data planes. By logically centralizing the network control in SDN controllers and implementing the traffic forwarding in switching hardware, SDN provides network pro grammability capabilities. The promise of SDN is that it allows network resources to be customized, orchestrated, and managed in a dynamic and automated manner. This offers the potential for optimizing use of network resources, resulting in immense performance advantages for both network operations and for the large-scale science applications that run on those networks. DOE’s Office of Science Advanced Scientific Computing Research (ASCR) office has funded work onmore » a number of SDN research projects, including BigData Express, SENSE, and SDN NGenIA, to develop advanced network capabilities and features in support of big data science. The BigData Express project research team has prototyped a SDN-enabled network service called AmoebaNet that provides “application-aware” network service in campus or local area networks. AmoebaNet takes an application-centric approach that allows science applications to program networks at run-time to suit their needs. The SENSE team adopts a res ource-centric approach to provide end-to-end orchestration o! f automated and dynamic circuit services across multiple WAN domains, with flow steering and shaping on Data Transfer Nodes (DTNs) located at the campus border. BigData Express and SENSE both aim to provide SDN-enabled network services for big data science, but they differ significantly in design principles – application-centric vs. resource-centric and in targeted areas – LAN/Campus networks vs. WAN/Campus border. On the other hand, SDN NGenIA focuses on consistent operations across both the application and network domains, with end-to-end orchestration including negotiation between the science programs’ workflow management systems and the network orchestration services. We propose a “SDN-enabled network services for big data science” session at Internet2 2018 Global Summit to present these SDN R&D activities. The following topics will be covered: (1) BigData Express and AmoebaNet’s design principles, requirements, architectures, and implementation. (2) BigData Express orch estrates and manages network resources using AmoebaNet. (3) AmoebaNet addresses the last mile problem and the network scalability problem in conjunction with WAN circuits/paths reservation such as ESnet OSCARS and Internet2 AL2S (4) SENSE’s design principals, requirements, architectures, and implementation. (5) SDN NGenIA’s focal areas, design principles and key components, as well as its role relative to SENSE and its large scale areas of application. (6) Requirements and limitation of requests and negotiations between resource domains and application-centric workflow orchestrators. This session is intended to achieve the following goals: (1) keep the R&E community informed of the latest progress on BigData Express, SDN NGenIA and SENSE; (2) collect feedback from the R&E community on our work; and (3) find potential collaborators to deploy and evaluate these SDN tools, as well as individual state of the art open source tools such as SDN NGenIA’s Fast Data Transfer (FDT) and its extensible agent-based architecture.« less

Authors:
 [1]
  1. Fermilab
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1460784
Report Number(s):
FERMILAB-SLIDES-18-060-CD
oai:inspirehep.net:1682973
DOE Contract Number:  
AC02-07CH11359
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English

Citation Formats

Wu, Wenji. BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer. United States: N. p., 2018. Web. doi:10.2172/1460784.
Wu, Wenji. BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer. United States. https://doi.org/10.2172/1460784
Wu, Wenji. Tue . "BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer". United States. https://doi.org/10.2172/1460784. https://www.osti.gov/servlets/purl/1460784.
@article{osti_1460784,
title = {BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer},
author = {Wu, Wenji},
abstractNote = {Big data has emerged as a driving force for scientific discoveries. Large scientific instruments (e.g., colliders, and telescopes) generate exponentially increasing volumes of data. To enable scientific discovery, science data must be collected, indexed, archived, shared, and analyzed, typically in a widely distributed and highly collaborative manner. Data transfer is now an essential function for science discoveries, particularly within big data environments. To support data transfer for big data science, there is a need for predictable, high performance, scalable, end-to-end, programmable networks that enable science applications to use network resources most efficiently. Software-Defined Networking (SDN) is a novel network architecture and technological approach that separates the network control and data planes. By logically centralizing the network control in SDN controllers and implementing the traffic forwarding in switching hardware, SDN provides network pro grammability capabilities. The promise of SDN is that it allows network resources to be customized, orchestrated, and managed in a dynamic and automated manner. This offers the potential for optimizing use of network resources, resulting in immense performance advantages for both network operations and for the large-scale science applications that run on those networks. DOE’s Office of Science Advanced Scientific Computing Research (ASCR) office has funded work on a number of SDN research projects, including BigData Express, SENSE, and SDN NGenIA, to develop advanced network capabilities and features in support of big data science. The BigData Express project research team has prototyped a SDN-enabled network service called AmoebaNet that provides “application-aware” network service in campus or local area networks. AmoebaNet takes an application-centric approach that allows science applications to program networks at run-time to suit their needs. The SENSE team adopts a res ource-centric approach to provide end-to-end orchestration o! f automated and dynamic circuit services across multiple WAN domains, with flow steering and shaping on Data Transfer Nodes (DTNs) located at the campus border. BigData Express and SENSE both aim to provide SDN-enabled network services for big data science, but they differ significantly in design principles – application-centric vs. resource-centric and in targeted areas – LAN/Campus networks vs. WAN/Campus border. On the other hand, SDN NGenIA focuses on consistent operations across both the application and network domains, with end-to-end orchestration including negotiation between the science programs’ workflow management systems and the network orchestration services. We propose a “SDN-enabled network services for big data science” session at Internet2 2018 Global Summit to present these SDN R&D activities. The following topics will be covered: (1) BigData Express and AmoebaNet’s design principles, requirements, architectures, and implementation. (2) BigData Express orch estrates and manages network resources using AmoebaNet. (3) AmoebaNet addresses the last mile problem and the network scalability problem in conjunction with WAN circuits/paths reservation such as ESnet OSCARS and Internet2 AL2S (4) SENSE’s design principals, requirements, architectures, and implementation. (5) SDN NGenIA’s focal areas, design principles and key components, as well as its role relative to SENSE and its large scale areas of application. (6) Requirements and limitation of requests and negotiations between resource domains and application-centric workflow orchestrators. This session is intended to achieve the following goals: (1) keep the R&E community informed of the latest progress on BigData Express, SDN NGenIA and SENSE; (2) collect feedback from the R&E community on our work; and (3) find potential collaborators to deploy and evaluate these SDN tools, as well as individual state of the art open source tools such as SDN NGenIA’s Fast Data Transfer (FDT) and its extensible agent-based architecture.},
doi = {10.2172/1460784},
url = {https://www.osti.gov/biblio/1460784}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {5}
}