---
code_id: 145418
site_ownership_code: "INL"
open_source: false
landing_contact: "agradmin@inl.gov"
project_type: "CS"
software_type: "S"
official_use_only: {}
developers:
- email: "Gabriel.Weaver@inl.gov"
  orcid: ""
  first_name: "Gabriel"
  last_name: "Weaver"
  middle_name: "A."
  affiliations:
  - "Idaho National Laboratory (INL), Idaho Falls, ID (United States)"
contributors: []
sponsoring_organizations:
- organization_name: "USDOE Office of Nuclear Energy (NE)"
  funding_identifiers: []
  primary_award: "AC07-05ID14517"
  DOE: true
contributing_organizations: []
research_organizations:
- organization_name: "Idaho National Laboratory (INL), Idaho Falls, ID (United States)"
  DOE: true
related_identifiers: []
award_dois: []
release_date: "2024-05-07"
software_title: "A Data Processing Pipeline To Extract A Knowledge Graph From Sec\
  \ Documents For Socio-technical Analysis Of Critical Infrastructure Influence"
acronym: "Adversarial Socio-Technical Network Analysis (ASTN)"
doi: "https://doi.org/10.11578/dc.20241010.1"
description: "The code is written in Python and consists of the following pipeline\
  \ that is implemented in Apache Airflow.  This pipeline intends to understand the\
  \ companies that are directly or indirectly involved with a type of critical infrastructure\
  \ system at some point in that system's lifecycle.  The pipeline takes a configuration\
  \ file that specifies a list of initial companies to consider, a geographic region\
  \ of interest (disk) expressed as a latitude/longitude point and distance, and a\
  \ set of SEC form types from which to extract entities and relations.  There are\
  \ three main components to this pipeline as currently implemented:  Social Network\
  \ Extraction, Critical Infrastructure Network Extraction, and Inference and Fusion.\
  \   \n\nFirst, Social Network Extraction, implemented as the `organizations_sec`\
  \ component of the workflow graph queries the SEC EDGAR webservice using the list\
  \ of initial companies from the configuration file.  Given this, it extracts metadata\
  \ that documents the number of each type of form for the given set of companies\
  \ and their location.  This forms metadata represents a catalog of data sources\
  \ for the extracted social network knowledge graph.  The pipeline then downloads\
  \ these forms from the website and saves them in a build directory for further processing.\
  \  These documents are then parsed for entities and relations.\n\nSecond, the Critical\
  \ Network Extraction component extracts entities and relations for a critical infrastructure\
  \ sector.  Currently, we focus on Electric Vehicle charging stations and this information\
  \ is available via the Department of Energy (DOE) database on fueling stations maintained\
  \ by NREL.\n\nThird, the Inference and Fusion component relates the social network\
  \ graph to the critical infrastructure graph in order to understand the impact of\
  \ a company within a geographic region.  Relations include ownership of the EV Charging\
  \ Station asset as well as maintenance/ownership of the EV payment networks.  The\
  \ fused network can be represented in many ways and currently we emit a knowledge\
  \ graph."
programming_languages:
- "Python"
country_of_origin: "United States"
keywords: "socio-technical network analysis (STNA); Electric Vehicles; multilayer\
  \ networks"
project_keywords: []
licenses: []
recipient_org: "Idaho National Laboratory"
file_name: "topgear-main.zip"
date_record_added: "2024-10-10"
date_record_updated: "2024-10-10"
is_file_certified: true
last_editor: "autumn.willard@inl.gov"
is_limited: false
links:
- rel: "citation"
  href: "https://www.osti.gov/doecode/biblio/145418"
