skip to main content

Title: Development of a SPARK Training Dataset

In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to bemore » a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the question, “Who leads research and development at PNNL, scientists or policy researchers?” The analysis was inconclusive as to whether policy researchers or scientists are primary drivers for research at PNNL. However, the dataset development and analysis activity did demonstrate the utility and usability of the SPARK dataset. After the initiation of the NGSI program there is a clear increase in the number of publications of safeguards products. Employing the natural language analysis tool IN SPIRE™ showed the presence of vocation- and topic-specific vernacular within NGSI sub-topics. The methodology developed to define the scope of the dataset was useful in describing safeguards applications, but may be applicable for research on other topics beyond safeguards. The analysis emphasized the need for an expanded dataset to fully understand the scope of safeguards publications and research both nationally and internationally. As the SPARK dataset grows to include publications outside PNNL, topics crosscutting disciplines and DOE/NNSA locations should become more apparent. NGSI was established in 2008 to cultivate the next generation of safeguards professionals and support the development of core safeguards capabilities (NNSA 2012). Now a robust system to preserve and share institutional memory such as SPARK is needed to inspire and equip the next generation of safeguards experts, technologies, and policies.« less
 [1] ;  [1]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Publication Date:
OSTI Identifier:
Report Number(s):
NN4009010; TRN: US1600028
DOE Contract Number:
Resource Type:
Technical Report
Research Org:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org:
Country of Publication:
United States