Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP

Journal Article · · Machine Learning
 [1];  [2];  [3];  [2];  [3]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); University of Southern California
  2. Univ. of North Carolina, Chapel Hill, NC (United States)
  3. Univ. of Southern California, Los Angeles, CA (United States). Information Sciences Institute
Large-scale scientific workflows rely heavily on high-performance file transfers. These transfers require strict quality parameters such as guaranteed bandwidth, no packet loss or data duplication. To have successful file transfers, methods such as predetermined thresholds and statistical analysis need to be done to determine abnormal patterns. Network administrators routinely monitor and analyze network data for diagnosing and alleviating these, making decisions based on their experience. However, as networks grow and become complex, monitoring large data files and quickly processing them, makes it improbable to identify errors and rectify these. Abnormal file transfers have been classified by simply setting alert thresholds, via tools such as PerfSonar and TCP statistics (Tstat). This paper investigates the feasibility of unsupervised feature extraction methods for identifying network anomaly patterns with three unsupervised classification methods—principal component analysis, autoencoder and isolation forest. Here, we collect file transfer statistics from two experiment sets—synthetic iPerf generated traffic and 1000 Genome workflow runs, with synthetically introduced anomalies. Our results show that while PCA and a simple autoencoder finds it difficult to detect clusters, the tree-variant isolation forest is able to identify anomalous packets by breaking down TCP traces into tree classes early.
Research Organization:
Univ. of Southern California, Los Angeles, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
SC0012636
OSTI ID:
1787027
Journal Information:
Machine Learning, Journal Name: Machine Learning Journal Issue: 5 Vol. 109; ISSN 0885-6125
Publisher:
Springer NatureCopyright Statement
Country of Publication:
United States
Language:
English

References (28)

Principal Component Analysis book January 2011
Passive analysis of TCP anomalies journal October 2008
Network anomaly detection through nonlinear analysis journal October 2010
A transform domain-based anomaly detection approach to network-wide traffic journal April 2014
A global reference for human genetic variation journal January 2015
Mining and modeling web trajectories from passive traces conference December 2017
Flow Monitoring Explained: From Packet Capture to Data Analysis With NetFlow and IPFIX journal January 2014
Unusual internet traffic detection at network edge conference December 2015
User patience and the Web: a hands-on investigation conference January 2003
Log summarization and anomaly detection for troubleshooting distributed systems conference September 2007
Anomaly detection for scientific workflow applications on networked clouds conference July 2016
Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection conference June 2006
Isolation Forest conference December 2008
Detection Network Anomalies Based on Packet and Flow Analysis
  • Wang, Hong; Gong, Zhenghu; Guan, Qing
  • 2008 Seventh International Conference on Networking (ICN '08), Seventh International Conference on Networking (icn 2008) https://doi.org/10.1109/ICN.2008.83
conference April 2008
Passive TCP stream estimation of RTT and jitter parameters conference January 2005
Automated traffic classification and application identification using machine learning conference January 2005
Unveiling network and service performance degradation in the wild with mplane journal March 2016
Traffic Analysis with Off-the-Shelf Hardware: Challenges and Lessons Learned journal March 2017
Experiences of Internet traffic monitoring with tstat journal May 2011
A Machine Learning Approach to TCP Throughput Prediction journal August 2010
Diagnosing network-wide traffic anomalies
  • Lakhina, Anukool; Crovella, Mark; Diot, Christophe
  • Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications - SIGCOMM '04 https://doi.org/10.1145/1015467.1015492
conference January 2004
Mining anomalies using traffic feature distributions
  • Lakhina, Anukool; Crovella, Mark; Diot, Christophe
  • Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications - SIGCOMM '05 https://doi.org/10.1145/1080091.1080118
conference January 2005
Boosting for transfer learning conference January 2007
Self-taught learning: transfer learning from unlabeled data conference January 2007
Measuring Web Speed From Passive Traces conference July 2018
A signal analysis of network traffic anomalies conference January 2002
Learning Deep Architectures for AI journal January 2009
Measurement Analysis of TCP Congestion Control Algorithms in LTE Uplink conference June 2018

Similar Records

Detecting Outliers in Network Transfers with Feature Extraction
Conference · Sun Jul 01 00:00:00 EDT 2018 · OSTI ID:1468101

An investigation of packet reordering in TCP traces (extended abstract)
Conference · Wed Dec 31 23:00:00 EST 2003 · OSTI ID:977651

Experiences with TCP/IP over an ATM OC12 WAN
Technical Report · Wed Dec 22 23:00:00 EST 1999 · OSTI ID:764365