A Data Processing Pipeline To Extract A Knowledge Graph From Sec Documents For Socio-technical Analysis Of Critical Infrastructure Influence
- Idaho National Laboratory (INL), Idaho Falls, ID (United States)
The code is written in Python and consists of the following pipeline that is implemented in Apache Airflow. This pipeline intends to understand the companies that are directly or indirectly involved with a type of critical infrastructure system at some point in that system's lifecycle. The pipeline takes a configuration file that specifies a list of initial companies to consider, a geographic region of interest (disk) expressed as a latitude/longitude point and distance, and a set of SEC form types from which to extract entities and relations. There are three main components to this pipeline as currently implemented: Social Network Extraction, Critical Infrastructure Network Extraction, and Inference and Fusion. First, Social Network Extraction, implemented as the `organizations_sec` component of the workflow graph queries the SEC EDGAR webservice using the list of initial companies from the configuration file. Given this, it extracts metadata that documents the number of each type of form for the given set of companies and their location. This forms metadata represents a catalog of data sources for the extracted social network knowledge graph. The pipeline then downloads these forms from the website and saves them in a build directory for further processing. These documents are then parsed for entities and relations. Second, the Critical Network Extraction component extracts entities and relations for a critical infrastructure sector. Currently, we focus on Electric Vehicle charging stations and this information is available via the Department of Energy (DOE) database on fueling stations maintained by NREL. Third, the Inference and Fusion component relates the social network graph to the critical infrastructure graph in order to understand the impact of a company within a geographic region. Relations include ownership of the EV Charging Station asset as well as maintenance/ownership of the EV payment networks. The fused network can be represented in many ways and currently we emit a knowledge graph.
- Short Name / Acronym:
- Adversarial Socio-Technical Network Analysis (ASTN)
- Project Type:
- Closed Source
- Software Type:
- Scientific
- Programming Language(s):
- Python
- Research Organization:
- Idaho National Laboratory (INL), Idaho Falls, ID (United States)
- Sponsoring Organization:
- USDOE Office of Nuclear Energy (NE)Primary Award/Contract Number:AC07-05ID14517
- DOE Contract Number:
- AC07-05ID14517
- Code ID:
- 145418
- OSTI ID:
- code-145418
- Country of Origin:
- United States
Similar Records
A Data Processing Pipeline To Extract A Knowledge Graph From Heterogeneous Data For Socio-technical Analysis Of Critical Infrastructure Influence
A Data Processing Pipeline for Adversarial Socio-Technical Network Analysis
A Data Processing Pipeline for Socio-Technical Network Analysis [Slides]
Software
·
Tue Oct 28 20:00:00 EDT 2025
·
OSTI ID:code-178006
A Data Processing Pipeline for Adversarial Socio-Technical Network Analysis
Conference
·
Mon Jun 12 20:00:00 EDT 2023
·
OSTI ID:2006801
A Data Processing Pipeline for Socio-Technical Network Analysis [Slides]
Technical Report
·
Sun Apr 30 20:00:00 EDT 2023
·
OSTI ID:2007807