Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Using advanced data structures to enable responsive security monitoring

Journal Article · · Cluster Computing
 [1];  [2];  [2];  [3];  [4];  [1];  [5];  [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Stony Brook Univ., NY (United States)
  3. Rutgers Univ., New Brunswick, NJ (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. Williams College, Williamstown, MA (United States)

Write-optimized data structures (WODS), offer the potential to keep up with cyberstream event rates and give sub-second query response for key items like IP addresses. These data structures organize logs as the events are observed. To work in a real-world environment and not fill up the disk, WODS must efficiently expire older events. As the basis for our research into organizing security monitoring data, we implemented a tool, called Diventi, to index IP addresses in connection logs using RocksDB (a write-optimized LSM tree). In this work, we extended Diventi to automatically expire data as part of the data structures’ normal operations. We guarantee that Diventi always tracks the N most recent events and tracks no more than N + k events for a parameter k < N, while ensuring the index is opportunistically pruned. To test Diventi at scale in a controlled environment, we used anonymized traces of IP communications collected at SuperComputing 2019. We synthetically extended the 2.4 billion connection events to 100 billion events. We tested Diventi vs. Elasticsearch, a common log indexing tool. In our test environment, Elasticsearch saw an ingestion rate of at best 37,000 events/s while Diventi sustained ingestion rates greater than 171,000 events/s. Our query response times were as much as 100 times faster, typically answering queries in under 80 ms. Furthermore, we saw no noticeable degradation in Diventi from expiration. We have deployed Diventi for many months where it has performed well and supported new security analysis capabilities.

Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program; National Science Foundation (NSF)
Grant/Contract Number:
NA0003525
OSTI ID:
1883172
Report Number(s):
SAND2021-15479J; 702298
Journal Information:
Cluster Computing, Journal Name: Cluster Computing Journal Issue: 4 Vol. 25; ISSN 1386-7857
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (10)

The log-structured merge-tree (LSM-tree) journal June 1996
Bro: a system for detecting network intruders in real-time journal December 1999
Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoffs conference January 2010
Cache-oblivious streaming B-trees
  • Bender, Michael A.; Farach-Colton, Martin; Fineman, Jeremy T.
  • Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07 https://doi.org/10.1145/1248377.1248393
conference January 2007
Bigtable: A Distributed Storage System for Structured Data journal June 2008
Cassandra: a decentralized structured storage system journal April 2010
Timely Reporting of Heavy Hitters using External Memory conference May 2020
Lethe: A Tunable Delete-Aware LSM Engine conference May 2020
Space/time trade-offs in hash coding with allowable errors journal July 1970
MyRocks journal August 2020

Similar Records

Diventi
Software · Tue Dec 10 19:00:00 EST 2019 · OSTI ID:code-46580

Data Architecture for Security Monitoring (Project Summary)
Technical Report · Sun Sep 01 00:00:00 EDT 2019 · OSTI ID:1569410

Efficient Databases for MPC Microdata (Final Report)
Technical Report · Thu Aug 16 00:00:00 EDT 2012 · OSTI ID:1048538