skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information
  1. Here we describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe howmore » to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site,, provides application skeletons that readers can adapt to realize their own research data portals.« less
  2. In February 2015 the third workshop in the CrossConnects series, with a focus on Improving Data Mobility & Management for International Cosmology, was held at Lawrence Berkeley National Laboratory. Scientists from fields including astrophysics, cosmology, and astronomy collaborated with experts in computing and networking to outline strategic opportunities for enhancing scientific productivity and effectively managing the ever-increasing scale of scientific data. While each field has unique details which depend on the instruments employed, the type and scale of the data, and the structure of scientific collaborations, several important themes emerged from the workshop discussions. Findings, as well as a setmore » of recommendations, are contained in their respective sections in this report.« less
  3. The mission of the U.S. Department of Energy Office of Science (DOE SC) is the delivery of scientific discoveries and major scientific tools to transform our understanding of nature and to advance the energy, economic, and national security missions of the United States. To achieve these goals in today’s world requires investments in not only the traditional scientific endeavors of theory and experiment, but also in computational science and the facilities that support large-scale simulation and data analysis. The Advanced Scientific Computing Research (ASCR) program addresses these challenges in the Office of Science. ASCR’s mission is to discover, develop, andmore » deploy computational and networking capabilities to analyze, model, simulate, and predict complex phenomena important to DOE. ASCR supports research in computational science, three high-performance computing (HPC) facilities — the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory and Leadership Computing Facilities at Argonne (ALCF) and Oak Ridge (OLCF) National Laboratories — and the Energy Sciences Network (ESnet) at Berkeley Lab. ASCR is guided by science needs as it develops research programs, computers, and networks at the leading edge of technologies. As we approach the era of exascale computing, technology changes are creating challenges for science programs in SC for those who need to use high performance computing and data systems effectively. Numerous significant modifications to today’s tools and techniques will be needed to realize the full potential of emerging computing systems and other novel computing architectures. To assess these needs and challenges, ASCR held a series of Exascale Requirements Reviews in 2015–2017, one with each of the six SC program offices,1 and a subsequent Crosscut Review that sought to integrate the findings from each. Participants at the reviews were drawn from the communities of leading domain scientists, experts in computer science and applied mathematics, ASCR facility staff, and DOE program managers in ASCR and the respective program offices. The purpose of these reviews was to identify mission-critical scientific problems within the DOE Office of Science (including experimental facilities) and determine the requirements for the exascale ecosystem that would be needed to address those challenges. The exascale ecosystem includes exascale computing systems, high-end data capabilities, efficient software at scale, libraries, tools, and other capabilities. This effort will contribute to the development of a strategic roadmap for ASCR compute and data facility investments and will help the ASCR Facility Division establish partnerships with Office of Science stakeholders. It will also inform the Office of Science research needs and agenda. The results of the six reviews have been published in reports available on the web at This report presents a summary of the individual reports and of common and crosscutting findings, and it identifies opportunities for productive collaborations among the DOE SC program offices.« less
  4. We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations.High-end networking, packet-filter firewalls, network intrusion-detection systems.We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs.The exponentially increasing amounts of "omics" data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research "Big Data." The storage, analysis, and networkmore » resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows.By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.« less
  5. The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The Science DMZ paradigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers andmore » research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.« less
  6. The Energy Sciences Network (ESnet) is the primary provider of network connectivity for the US Department of Energy Office of Science, the single largest supporter of basic research in the physical sciences in the United States. In support of the Office of Science programs, ESnet regularly updates and refreshes its understanding of the networking requirements of the instruments, facilities, scientists, and science programs that it serves. This focus has helped ESnet to be a highly successful enabler of scientific discovery for over 20 years. In April 2009 ESnet and the Office of Advanced Scientific Computing Research (ASCR), of the DOEmore » Office of Science, organized a workshop to characterize the networking requirements of the programs funded by ASCR. The ASCR facilities anticipate significant increases in wide area bandwidth utilization, driven largely by the increased capabilities of computational resources and the wide scope of collaboration that is a hallmark of modern science. Many scientists move data sets between facilities for analysis, and in some cases (for example the Earth System Grid and the Open Science Grid), data distribution is an essential component of the use of ASCR facilities by scientists. Due to the projected growth in wide area data transfer needs, the ASCR supercomputer centers all expect to deploy and use 100 Gigabit per second networking technology for wide area connectivity as soon as that deployment is financially feasible. In addition to the network connectivity that ESnet provides, the ESnet Collaboration Services (ECS) are critical to several science communities. ESnet identity and trust services, such as the DOEGrids certificate authority, are widely used both by the supercomputer centers and by collaborations such as Open Science Grid (OSG) and the Earth System Grid (ESG). Ease of use is a key determinant of the scientific utility of network-based services. Therefore, a key enabling aspect for scientists beneficial use of high performance networks is a consistent, widely deployed, well-maintained toolset that is optimized for wide area, high-speed data transfer (e.g. GridFTP) that allows scientists to easily utilize the services and capabilities that the network provides. Network test and measurement is an important part of ensuring that these tools and network services are functioning correctly. One example of a tool in this area is the recently developed perfSONAR, which has already shown its usefulness in fault diagnosis during the recent deployment of high-performance data movers at NERSC and ORNL. On the other hand, it is clear that there is significant work to be done in the area of authentication and access control - there are currently compatibility problems and differing requirements between the authentication systems in use at different facilities, and the policies and mechanisms in use at different facilities are sometimes in conflict. Finally, long-term software maintenance was of concern for many attendees. Scientists rely heavily on a large deployed base of software that does not have secure programmatic funding. Software packages for which this is true include data transfer tools such as GridFTP as well as identity management and other software infrastructure that forms a critical part of the Open Science Grid and the Earth System Grid.« less
  7. Today users visit synchrotrons as sources of understanding and discovery—not as sources of just light, and not as sources of data. To achieve this, the synchrotron facilities frequently provide not just light but often the entire end station and increasingly, advanced computational facilities that can reduce terabytes of data into a form that can reveal a new key insight. The Advanced Light Source (ALS) has partnered with high performance computing, fast networking, and applied mathematics groups to create a “super-facility”, giving users simultaneous access to the experimental, computational, and algorithmic resources to make this possible. This combination forms an efficientmore » closed loop, where data—despite its high rate and volume—is transferred and processed immediately and automatically on appropriate computing resources, and results are extracted, visualized, and presented to users or to the experimental control system, both to provide immediate insight and to guide decisions about subsequent experiments during beamtime. We will describe our work at the ALS ptychography, scattering, micro-diffraction, and micro-tomography beamlines.« less
  8. The widespread use of computing in the American economy would not be possible without a thoughtful, exploratory research and development (R&D) community pushing the performance edge of operating systems, computer languages, and software libraries. These are the tools and building blocks — the hammers, chisels, bricks, and mortar — of the smartphone, the cloud, and the computing services on which we rely. Engineers and scientists need ever-more specialized computing tools to discover new material properties for manufacturing, make energy generation safer and more efficient, and provide insight into the fundamentals of the universe, for example. The research division of themore » U.S. Department of Energy’s (DOE’s) Office of Advanced Scientific Computing and Research (ASCR Research) ensures that these tools and building blocks are being developed and honed to meet the extreme needs of modern science. See also for additional information.« less
  9. The Energy Sciences Network (ESnet) is the primary provider of network connectivity for the U.S. Department of Energy Office of Science, the single largest supporter of basic research in the physical sciences in the United States. In support of the Office of Science programs, ESnet regularly updates and refreshes its understanding of the networking requirements of the instruments, facilities, scientists, and science programs that it serves. This focus has helped ESnet to be a highly successful enabler of scientific discovery for over 25 years. In December 2011, ESnet and the Office of Fusion Energy Sciences (FES), of the DOE Officemore » of Science (SC), organized a workshop to characterize the networking requirements of the programs funded by FES. The requirements identified at the workshop are summarized in the Findings section, and are described in more detail in the body of the report.« less
  10. U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at themore » DOE national laboratories. The report contains findings from that review.« less

Search for:
All Records
Creator / Author
"Dart, Eli"

Refine by:
Resource Type
Publication Date
Creator / Author
Research Organization