skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information
  1. The Modern Research Data Portal: a design pattern for networked, data-intensive science

    We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. Here, we capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe howmore » to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site,https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.« less
  2. Improving Data Mobility & Management for International Cosmology

    In February 2015 the third workshop in the CrossConnects series, with a focus on Improving Data Mobility & Management for International Cosmology, was held at Lawrence Berkeley National Laboratory. Scientists from fields including astrophysics, cosmology, and astronomy collaborated with experts in computing and networking to outline strategic opportunities for enhancing scientific productivity and effectively managing the ever-increasing scale of scientific data. While each field has unique details which depend on the instruments employed, the type and scale of the data, and the structure of scientific collaborations, several important themes emerged from the workshop discussions. Findings, as well as a setmore » of recommendations, are contained in their respective sections in this report.« less
  3. The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science

    Here we describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe howmore » to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.« less
  4. Crosscut report: Exascale Requirements Reviews, March 9–10, 2017 – Tysons Corner, Virginia. An Office of Science review sponsored by: Advanced Scientific Computing Research, Basic Energy Sciences, Biological and Environmental Research, Fusion Energy Sciences, High Energy Physics, Nuclear Physics

    The mission of the U.S. Department of Energy Office of Science (DOE SC) is the delivery of scientific discoveries and major scientific tools to transform our understanding of nature and to advance the energy, economic, and national security missions of the United States. To achieve these goals in today’s world requires investments in not only the traditional scientific endeavors of theory and experiment, but also in computational science and the facilities that support large-scale simulation and data analysis. The Advanced Scientific Computing Research (ASCR) program addresses these challenges in the Office of Science. ASCR’s mission is to discover, develop, andmore » deploy computational and networking capabilities to analyze, model, simulate, and predict complex phenomena important to DOE. ASCR supports research in computational science, three high-performance computing (HPC) facilities — the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory and Leadership Computing Facilities at Argonne (ALCF) and Oak Ridge (OLCF) National Laboratories — and the Energy Sciences Network (ESnet) at Berkeley Lab. ASCR is guided by science needs as it develops research programs, computers, and networks at the leading edge of technologies. As we approach the era of exascale computing, technology changes are creating challenges for science programs in SC for those who need to use high performance computing and data systems effectively. Numerous significant modifications to today’s tools and techniques will be needed to realize the full potential of emerging computing systems and other novel computing architectures. To assess these needs and challenges, ASCR held a series of Exascale Requirements Reviews in 2015–2017, one with each of the six SC program offices,1 and a subsequent Crosscut Review that sought to integrate the findings from each. Participants at the reviews were drawn from the communities of leading domain scientists, experts in computer science and applied mathematics, ASCR facility staff, and DOE program managers in ASCR and the respective program offices. The purpose of these reviews was to identify mission-critical scientific problems within the DOE Office of Science (including experimental facilities) and determine the requirements for the exascale ecosystem that would be needed to address those challenges. The exascale ecosystem includes exascale computing systems, high-end data capabilities, efficient software at scale, libraries, tools, and other capabilities. This effort will contribute to the development of a strategic roadmap for ASCR compute and data facility investments and will help the ASCR Facility Division establish partnerships with Office of Science stakeholders. It will also inform the Office of Science research needs and agenda. The results of the six reviews have been published in reports available on the web at http://exascaleage.org/. This report presents a summary of the individual reports and of common and crosscutting findings, and it identifies opportunities for productive collaborations among the DOE SC program offices.« less
  5. The medical science DMZ: a network design pattern for data-intensive medical science

    We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations.High-end networking, packet-filter firewalls, network intrusion-detection systems.We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs.The exponentially increasing amounts of "omics" data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research "Big Data." The storage, analysis, and networkmore » resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows.By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.« less
  6. The Science DMZ: A Network Design Pattern for Data-Intensive Science

    The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The Science DMZ paradigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers andmore » research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.« less
  7. Development of a fast framing detector for electron microscopy

    A high frame rate detector system is described that enables fast real-time data analysis of scanning diffraction experiments in scanning transmission electron microscopy (STEM). This is an end-to-end development that encompasses the data producing detector, data transportation, and real-time processing of data. The detector will consist of a central pixel sensor that is surrounded by annular silicon diodes. Both components of the detector system will synchronously capture data at almost 100 kHz frame rate, which produces an approximately 400 Gb/s data stream. Low-level preprocessing will be implemented in firmware before the data is streamed from the National Center for Electronmore » Microscopy (NCEM) to the National Energy Research Scientific Computing Center (NERSC). Live data processing, before it lands on disk, will happen on the Cori supercomputer and aims to present scientists with prompt experimental feedback. This online analysis will provide rough information of the sample that can be utilized for sample alignment, sample monitoring and verification that the experiment is set up correctly. Only a compressed version of the relevant data is then selected for more in-depth processing.« less
  8. ASCR Science Network Requirements

    The Energy Sciences Network (ESnet) is the primary provider of network connectivity for the US Department of Energy Office of Science, the single largest supporter of basic research in the physical sciences in the United States. In support of the Office of Science programs, ESnet regularly updates and refreshes its understanding of the networking requirements of the instruments, facilities, scientists, and science programs that it serves. This focus has helped ESnet to be a highly successful enabler of scientific discovery for over 20 years. In April 2009 ESnet and the Office of Advanced Scientific Computing Research (ASCR), of the DOEmore » Office of Science, organized a workshop to characterize the networking requirements of the programs funded by ASCR. The ASCR facilities anticipate significant increases in wide area bandwidth utilization, driven largely by the increased capabilities of computational resources and the wide scope of collaboration that is a hallmark of modern science. Many scientists move data sets between facilities for analysis, and in some cases (for example the Earth System Grid and the Open Science Grid), data distribution is an essential component of the use of ASCR facilities by scientists. Due to the projected growth in wide area data transfer needs, the ASCR supercomputer centers all expect to deploy and use 100 Gigabit per second networking technology for wide area connectivity as soon as that deployment is financially feasible. In addition to the network connectivity that ESnet provides, the ESnet Collaboration Services (ECS) are critical to several science communities. ESnet identity and trust services, such as the DOEGrids certificate authority, are widely used both by the supercomputer centers and by collaborations such as Open Science Grid (OSG) and the Earth System Grid (ESG). Ease of use is a key determinant of the scientific utility of network-based services. Therefore, a key enabling aspect for scientists beneficial use of high performance networks is a consistent, widely deployed, well-maintained toolset that is optimized for wide area, high-speed data transfer (e.g. GridFTP) that allows scientists to easily utilize the services and capabilities that the network provides. Network test and measurement is an important part of ensuring that these tools and network services are functioning correctly. One example of a tool in this area is the recently developed perfSONAR, which has already shown its usefulness in fault diagnosis during the recent deployment of high-performance data movers at NERSC and ORNL. On the other hand, it is clear that there is significant work to be done in the area of authentication and access control - there are currently compatibility problems and differing requirements between the authentication systems in use at different facilities, and the policies and mechanisms in use at different facilities are sometimes in conflict. Finally, long-term software maintenance was of concern for many attendees. Scientists rely heavily on a large deployed base of software that does not have secure programmatic funding. Software packages for which this is true include data transfer tools such as GridFTP as well as identity management and other software infrastructure that forms a critical part of the Open Science Grid and the Earth System Grid.« less
  9. Making Advanced Scientific Algorithms and Big Scientific Data Management More Accessible

    Synchrotrons such as the Advanced Light Source (ALS) at Lawrence Berkeley National Laboratory are known as user facilities. They are sources of extremely bright X-ray beams, and scientists come from all over the world to perform experiments that require these beams. As the complexity of experiments has increased, and the size and rates of data sets has exploded, managing, analyzing and presenting the data collected at synchrotrons has been an increasing challenge. The ALS has partnered with high performance computing, fast networking, and applied mathematics groups to create a"super-facility", giving users simultaneous access to the experimental, computational, and algorithmic resourcesmore » to overcome this challenge. This combination forms an efficient closed loop, where data despite its high rate and volume is transferred and processed, in many cases immediately and automatically, on appropriate compute resources, and results are extracted, visualized, and presented to users or to the experimental control system, both to provide immediate insight and to guide decisions about subsequent experiments during beam-time. In this paper, We will present work done on advanced tomographic reconstruction algorithms to support users of the 3D micron-scale imaging instrument (Beamline 8.3.2, hard X-ray micro-tomography).« less
  10. Real-time data-intensive computing

    Today users visit synchrotrons as sources of understanding and discovery—not as sources of just light, and not as sources of data. To achieve this, the synchrotron facilities frequently provide not just light but often the entire end station and increasingly, advanced computational facilities that can reduce terabytes of data into a form that can reveal a new key insight. The Advanced Light Source (ALS) has partnered with high performance computing, fast networking, and applied mathematics groups to create a “super-facility”, giving users simultaneous access to the experimental, computational, and algorithmic resources to make this possible. This combination forms an efficientmore » closed loop, where data—despite its high rate and volume—is transferred and processed immediately and automatically on appropriate computing resources, and results are extracted, visualized, and presented to users or to the experimental control system, both to provide immediate insight and to guide decisions about subsequent experiments during beamtime. We will describe our work at the ALS ptychography, scattering, micro-diffraction, and micro-tomography beamlines.« less
...

Search for:
All Records
Creator / Author
"Dart, Eli"

Refine by:
Resource Type
Availability
Publication Date
Creator / Author
Research Organization