---
code_id: 178006
site_ownership_code: "INL"
open_source: false
landing_contact: "agradmin@inl.gov"
project_type: "CS"
software_type: "S"
official_use_only: {}
developers:
- email: ""
  orcid: ""
  first_name: "Gabriel"
  last_name: "Weaver"
  middle_name: ""
  affiliations:
  - "Idaho National Laboratory (INL), Idaho Falls, ID (United States)"
- email: ""
  orcid: ""
  first_name: "Mary"
  last_name: "Klet"
  middle_name: ""
  affiliations:
  - "Idaho National Laboratory (INL), Idaho Falls, ID (United States)"
- email: ""
  orcid: ""
  first_name: "Steven"
  last_name: "Hall"
  middle_name: ""
  affiliations:
  - "Idaho National Laboratory (INL), Idaho Falls, ID (United States)"
contributors: []
sponsoring_organizations:
- organization_name: "USDOE Office of Nuclear Energy (NE)"
  funding_identifiers: []
  primary_award: "AC07-05ID14517"
  DOE: true
contributing_organizations: []
research_organizations:
- organization_name: "Idaho National Laboratory (INL), Idaho Falls, ID (United States)"
  DOE: true
related_identifiers: []
award_dois: []
release_date: "2025-10-29"
software_title: "A Data Processing Pipeline To Extract A Knowledge Graph From Heterogeneous\
  \ Data For Socio-technical Analysis Of Critical Infrastructure Influence"
acronym: "TOPGEAR:  Technology, Organization, or Person of i"
doi: "https://doi.org/10.11578/dc.20260324.2"
description: "The code is written in Python and consists of the following pipeline\
  \ that is implemented in Apache Airflow.  This pipeline intends to understand the\
  \ companies that are directly or indirectly involved with a type of critical infrastructure\
  \ system at some point in that system's lifecycle.  The pipeline takes a configuration\
  \ file that specifies a list of initial companies to consider, a geographic region\
  \ of interest, and a set of SEC form types as well as other data sources (e.g. CrunchBase)\
  \ from which to extract entities and relations.  There are four main components\
  \ to this pipeline as currently implemented:  Entity Extraction, Network Construction,\
  \ Analysis, and Visualization.   \n\nFirst, Entity Extraction, is implemented as\
  \ the `topear-extract_organizations` Apache Airflow workflow.  Given an initial\
  \ query that specifies a geographic region of interest and a time interval, the\
  \ software will extract CI facilities of interest and organizations that have a\
  \ direct influence relationship to those facilities (e.g. ownership).   During the\
  \ course of the LDRD, we focused on Electric Vehicle charging stations and this\
  \ information is available via the Department of Energy (DOE) database on fueling\
  \ stations maintained by NREL.  Within the context of the DOE CESER project, we\
  \ have focused on Battery Energy Storage Systems (BESS). \n\nSecond, the Network\
  \ Extraction component will iteratively construct a social network graph given the\
  \ set of organizations and people extracted in the previous step.  Organizations\
  \ (and eventually People if desired) are then fed as a query to the `topgear-construct_social_network`\
  \ Apache Airflow workflow which given a set of initial companies and data sets (e.g.\
  \ SEC EDGAR form types, OpenCorporates, Crunchbase).  This Airflow workflow will\
  \ iteratively query such data sources to discover relationships with new organizations\
  \ and people.  For example, this module can iteratively query SEC EDGAR for metadata\
  \ that documents the number of each type of form for the given set of companies\
  \ and their location.  This forms metadata represents a catalog of data sources\
  \ from SEC EDGAR for the extracted social network knowledge graph.  The pipeline\
  \ then downloads these forms from the website and saves them in a build directory\
  \ for further processing.  These documents are then parsed for entities and relations.\
  \  Again, we note that in additional to SEC data sources, this step can also pull\
  \ in information on organizations via API services such as CrunchBase and OpenCorporates\
  \ or bulk data sources.  At the end of this step, the resultant social network,\
  \ the Critical Infrastructure network, and the edges that encode relationships between\
  \ organizations and CI facilities, form the Adversarial Socio-Technical Network\
  \ (ASTN) that informs the analysis.\n\nThird, the Analysis component processes these\
  \ generated ASTN.  Previously, that has included the ability to compare prevalence\
  \ of different vendors for a given infrastructure component type across different\
  \ regions as well as identify common public and private investors across those vendors.\
  \  This was demonstrated for EV Charging Stations across several different metropolitan\
  \ areas within an IEEE PES GridEdge publication.  More recently, we have looked\
  \ at ways to identify infrastructure owners and operators of BESS with the most\
  \ nameplate capacity across different states as well as other indictors of risk\
  \ resulting from changes in ownership over time.  \n\nFinally, the Visualization\
  \ component consists of an HTML/CSS/JS framework by which users can interact geospatial,\
  \ operational, and organizational relationships across a given portfolio of Critical\
  \ Infrastructure facilities.  The objective is to provide a library of UI/UX modules\
  \ that can be repurposed for stakeholder-specific dashboards. All of the modules\
  \ are related via a common event model that enables UI actions in one view to percolate\
  \ across the other views."
programming_languages: []
country_of_origin: "United States"
keywords: "socio-technical network analysis (STNA); Electric Vehicles; multilayer\
  \ networks;"
project_keywords: []
licenses: []
recipient_org: "INL"
file_name: "TopGear.inl.zip"
date_record_added: "2026-03-24"
date_record_updated: "2026-03-24"
is_file_certified: true
last_editor: "ariauna.talamelli@inl.gov"
is_limited: false
links:
- rel: "citation"
  href: "https://www.osti.gov/doecode/biblio/178006"
