NEPATEC2.0: NEPA Text Corpus v2.0
Abstract
The National Environmental Policy Act of 1969, as amended (NEPA), is a major environmental law in the United States, requiring Federal agencies to consider and document potential environmental impacts before deciding on a proposed action. Modernization of NEPA and permitting processes faces significant challenges due to the lack of standardized formats and interoperable systems for organizing and sharing NEPA-related information across agencies. Much of the information gathered during NEPA reviews is written into documents such as categorical exclusions, environmental assessments, and environmental impact statements, then filed in predominately independent agency file stores that may or may not be publicly accessible. The application of metadata and data standards, such as those recommended by the Council on Environmental Quality (CEQ), to NEPA documents offers a shared vocabulary and structure for key entities like projects, processes, and documents that can streamline information exchange and enhance collaboration across systems. In this work, we publicly release NEPATEC2.0, an expanded corpus of NEPA documents with associated metadata. NEPATEC2.0 encompasses approximately 120,000 documents from 60,000 projects prepared by more than 60 different agencies. Modeled to align with CEQ metadata standards, NEPATEC2.0 promotes consistency in environmental reviews and supports the ongoing effort to modernize permitting technologies by facilitatingmore »
- Authors:
- Publication Date:
- Other Number(s):
- PNNL-SA-216398
- DOE Contract Number:
- AC05-76RL01830
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- Subject:
- Large Language Models; Large Language Models (LLM); Large Language Models (LLMs); NEPA; environmental review; • Artificial intelligence (AI) / machine learning (ML)
- OSTI Identifier:
- 2997034
- DOI:
- https://doi.org/10.11578/2997034
Citation Formats
Munikoti, Sai, Nally, Daniel M., Koneru, Sai D., Das, Siddhartha Shankar, Bhattacharjee, Kaustav, Buchko, Alexander C., Edwards, Taylor C., Nwe, Kathy, Raskar, Siddhisanket, Rigor, Paul M., Taylor, Micah S., Spare, Scott T., Lilienthal, Derek B., Halappanavar, Mahantesh M., Acharya, Anurag, Vega, Timothy J., Parker, Michael J., and Horawalavithana, Yasanka S. NEPATEC2.0: NEPA Text Corpus v2.0. United States: N. p., 2025.
Web. doi:10.11578/2997034.
Munikoti, Sai, Nally, Daniel M., Koneru, Sai D., Das, Siddhartha Shankar, Bhattacharjee, Kaustav, Buchko, Alexander C., Edwards, Taylor C., Nwe, Kathy, Raskar, Siddhisanket, Rigor, Paul M., Taylor, Micah S., Spare, Scott T., Lilienthal, Derek B., Halappanavar, Mahantesh M., Acharya, Anurag, Vega, Timothy J., Parker, Michael J., & Horawalavithana, Yasanka S. NEPATEC2.0: NEPA Text Corpus v2.0. United States. doi:https://doi.org/10.11578/2997034
Munikoti, Sai, Nally, Daniel M., Koneru, Sai D., Das, Siddhartha Shankar, Bhattacharjee, Kaustav, Buchko, Alexander C., Edwards, Taylor C., Nwe, Kathy, Raskar, Siddhisanket, Rigor, Paul M., Taylor, Micah S., Spare, Scott T., Lilienthal, Derek B., Halappanavar, Mahantesh M., Acharya, Anurag, Vega, Timothy J., Parker, Michael J., and Horawalavithana, Yasanka S. 2025.
"NEPATEC2.0: NEPA Text Corpus v2.0". United States. doi:https://doi.org/10.11578/2997034. https://www.osti.gov/servlets/purl/2997034. Pub date:Tue Sep 30 00:00:00 EDT 2025
@article{osti_2997034,
title = {NEPATEC2.0: NEPA Text Corpus v2.0},
author = {Munikoti, Sai and Nally, Daniel M. and Koneru, Sai D. and Das, Siddhartha Shankar and Bhattacharjee, Kaustav and Buchko, Alexander C. and Edwards, Taylor C. and Nwe, Kathy and Raskar, Siddhisanket and Rigor, Paul M. and Taylor, Micah S. and Spare, Scott T. and Lilienthal, Derek B. and Halappanavar, Mahantesh M. and Acharya, Anurag and Vega, Timothy J. and Parker, Michael J. and Horawalavithana, Yasanka S.},
abstractNote = {The National Environmental Policy Act of 1969, as amended (NEPA), is a major environmental law in the United States, requiring Federal agencies to consider and document potential environmental impacts before deciding on a proposed action. Modernization of NEPA and permitting processes faces significant challenges due to the lack of standardized formats and interoperable systems for organizing and sharing NEPA-related information across agencies. Much of the information gathered during NEPA reviews is written into documents such as categorical exclusions, environmental assessments, and environmental impact statements, then filed in predominately independent agency file stores that may or may not be publicly accessible. The application of metadata and data standards, such as those recommended by the Council on Environmental Quality (CEQ), to NEPA documents offers a shared vocabulary and structure for key entities like projects, processes, and documents that can streamline information exchange and enhance collaboration across systems. In this work, we publicly release NEPATEC2.0, an expanded corpus of NEPA documents with associated metadata. NEPATEC2.0 encompasses approximately 120,000 documents from 60,000 projects prepared by more than 60 different agencies. Modeled to align with CEQ metadata standards, NEPATEC2.0 promotes consistency in environmental reviews and supports the ongoing effort to modernize permitting technologies by facilitating more transparent, efficient, and data-driven decision-making. Importantly, NEPATEC2.0 demonstrates the possibilities and limitations of large language model-based prompting to extract information from NEPA documents at scale.},
doi = {10.11578/2997034},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Sep 30 00:00:00 EDT 2025},
month = {Tue Sep 30 00:00:00 EDT 2025}
}
