skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A High-Throughput Computing Infrastructure to Generate Custom, Open Community Geothermal Datasets

Technical Report ·
DOI:https://doi.org/10.2172/1856435· OSTI ID:1856435

The most significant challenge facing geothermal research, development, and deployment is a lack of comprehensive datasets describing the geological and economical properties of North America. Automated knowledge base construction, the process of designing algorithms to analyze text and images to programmatically build new datasets, is one possible solution to this problem. The xDD library of full-text scientific articles (https://xdd.wisc.edu) is one of the largest collections of open and controlled-access scientific documents available for knowledge base construction in the world, but it has been underutilized by experts in geothermal research. The xDD development team attributed the lack of engagement by software developers and geothermal researchers to two perceived shortcomings of the system. First, the workflow for obtaining data from xDD for local development and testing of data mining applications was unnecessarily abstruse and required significant manual intervention by xDD systems administrators. Second, although xDD already held articles from a broad cross-section of scientific literature with an emphasis on the geosciences, it did not have an explicit set of geothermal research documents that could serve as the nucleus of a geothermal data mining application. To address these issues, the Automated Data Extraction PlaTform (ADEPT) was proposed to extend the data distribution capabilities of the xDD system. The ADEPT extension added the following four key features to xDD: 1) integration of National Geothermal Data System (NGDS) documents into the xDD library to provide an explicitly geothermally-themed collection; 2) improved RESTful (i.e., https-protocol driven) web services for external partners to access xDD data for machine learning application development; 3) a web platform for end-users and xDD administrators to coordinate the development of data mining applications from the initial step of browsing available documents to the final stage of deploying a production-quality machine learning application on high-throughput computing infrastructure; and 4) the development of demonstration data mining applications to illustrate the new workflow to potential collaborators. A total of 21,674 geothermal documents from NGDS were fully ingested into the xDD library and the associated metadata is publicly available through the xDD web services; furthermore, the ADEPT web platform is now publicly accessible and fully live at https://xdd.wisc.edu/adept/.

Research Organization:
Univ. of Arizona, Tucson, AZ (United States)
Sponsoring Organization:
USDOE Office of Artificial Intelligence and Technology (AITO); USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Office
DOE Contract Number:
EE0008761
OSTI ID:
1856435
Report Number(s):
Final-report-DOE-UofA-EE8761
Country of Publication:
United States
Language:
English