skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Phasor-Measurement-Unit-Based Data Analytics Using Digital Twin and PhasorAnalytics Software

Technical Report ·
DOI:https://doi.org/10.2172/1828164· OSTI ID:1828164

A major objective of this project was to apply GE’s commercial machine learning and data analytics toolsets to large-scale, real-world, anonymized Phasor Measurement Unit (PMU) datasets in order to extract signatures, correlated and/or causal factors, and precursor patterns associated with significant power system phenomena. The project had a particular emphasis on extraction of insights relevant to asset health monitoring, real-time load modeling and cybersecurity monitoring. Additionally, the team was directed to undertake a comprehensive data quality analysis for the provided datasets and encouraged to estimate the ‘machine-learning readiness’ of the datasets by documenting any major obstacles to the application of commercial machine learning algorithms. To accomplish the aforementioned objectives, the project team’s work centered around the identification of key event signatures and application of the identified event signatures for event detection and event classification. The industry-validated, semi-supervised machine learning strategy employed for event signature identification involved several major tasks, including data-preprocessing, generation of an overabundance of features, normal data identification, normality modeling, and event signature identification through a methodical, quantitative ranking of features in order of relevance to each studied event type. Throughout the project, data quality issues and mitigation techniques were investigated. In this report, insights are provided regarding the readiness of the provided synchrophasor datasets for application of machine learning and data analytics. The methodologies employed for this technical strategy are summarized in this report. With regards to data preprocessing and feature generation, the provided Training and Test Datasets were ingested into GE’s big data environment. Subsequently, the team applied bad data cleansing and data imputation scripts, event detection scripts, and application programming interfaces (APIs) to the datasets for convenient data access. The project team completed development and validation of dozens of physics-based, statistics-based and transformation-based feature functions used for the extraction of over 60 synchrophasor features. Using a new parallel feature generation technology developed on this project, over 60 features have been rapidly generated for the full two years’ worth of Training and Test Dataset data associated with both the Eastern and Western interconnects. Even accommodating for temporal down-sampling inherent to the feature extraction procedure, this parallel feature generation activity resulted in a massive feature set with a storage requirement approximately equal to that of the raw training dataset itself. With regards to normal data identification and normality modeling, a normality model was built using the feature data extracted from the Training Dataset and iteratively refined subsequent to incremental adjustments and expansions of the Training Dataset feature data. With respect to event characterization and signature identification, an event signature identification pipeline was developed and used in conjunction with the normality model to identify over 15 event signatures for key event categories within the Training Dataset. The identified event signatures were used to characterize hundreds of key events in terms of relative severity, duration, and location of the event. An investigation was undertaken to identify correlated and causal factors involved in transformer events. A separate investigation into temporal trends in ring-down analysis results was undertaken to determine possible associations between system dynamics and various other factors such as loading, season or year. To validate the identified event signatures, additional work was undertaken to develop signature-based anomaly detection and classification tools suitable for convenient application to the synchrophasor datasets. The anomaly detection and classification tools, suitable for online application, were then applied to the entirety of the Eastern Interconnect Training and Test Datasets. Performance of the event detection and classification tools was evaluated upon receipt of the Test Dataset event logs (i.e., the labels for events contained in the Test Dataset), and promising results were obtained despite several challenges (documented herein) associated with application of supervised or semi-supervised machine learning methods to large-scale, anonymized datasets. Finally, the detection and classification tools were used to detect, classify, and characterize thousands of new events not included in the original event logs provided by the DOE within both the Training and Test Datasets.

Research Organization:
GE Research, Niskayuna, NY (United States)
Sponsoring Organization:
USDOE Office of Electricity (OE)
Contributing Organization:
GE Digital
DOE Contract Number:
OE0000915
OSTI ID:
1828164
Report Number(s):
DOE-GE-0000915-1
Country of Publication:
United States
Language:
English