Ten million and one penguins, or, lessons learned from booting millions of virtual machines on HPC systems.

Minnich, Ronald G; Rudish, Donald W

Title: Ten million and one penguins, or, lessons learned from booting millions of virtual machines on HPC systems.

Conference · Thu Jan 01 00:00:00 EST 2009

OSTI ID:984168

Minnich, Ronald G; Rudish, Donald W

In this paper we describe Megatux, a set of tools we are developing for rapid provisioning of millions of virtual machines and controlling and monitoring them, as well as what we've learned from booting one million Linux virtual machines on the Thunderbird (4660 nodes) and 550,000 Linux virtual machines on the Hyperion (1024 nodes) clusters. As might be expected, our tools use hierarchical structures. In contrast to existing HPC systems, our tools do not require perfect hardware; that all systems be booted at the same time; and static configuration files that define the role of each node. While we believe these tools will be useful for future HPC systems, we are using them today to construct botnets. Botnets have been in the news recently, as discoveries of their scale (millions of infected machines for even a single botnet) and their reach (global) and their impact on organizations (devastating in financial costs and time lost to recovery) have become more apparent. A distinguishing feature of botnets is their emergent behavior: fairly simple operational rule sets can result in behavior that cannot be predicted. In general, there is no reducible understanding of how a large network will behave ahead of 'running it'. 'Running it' means observing the actual network in operation or simulating/emulating it. Unfortunately, this behavior is only seen at scale, i.e. when at minimum 10s of thousands of machines are infected. To add to the problem, botnets typically change at least 11% of the machines they are using in any given week, and this changing population is an integral part of their behavior. The use of virtual machines to assist in the forensics of malware is not new to the cyber security world. Reverse engineering techniques often use virtual machines in combination with code debuggers. Nevertheless, this task largely remains a manual process to get past code obfuscation and is inherently slow. As part of our cyber security work at Sandia National Laboratories, we are striving to understand the global network behavior of botnets. We are planning to take existing botnets, as found in the wild, and run them on HPC systems. We have turned to HPC systems to support the creation and operation of millions of Linux virtual machines as a means of observing the interaction of the botnet and other noninfected hosts. We started out using traditional HPC tools, but these tools are designed for a much smaller scale, typically topping out at one to ten thousand machines. HPC programming libraries and tools also assume complete connectivity between all nodes, with the attendant configuration files and data structures to match; this assumption holds up very poorly on systems with millions of nodes.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 984168

Report Number(s):: SAND2010-1125C; TRN: US201015%%851

Resource Relation:: Conference: Proposed for presentation at the Fourth Workshop on System-level Virtualization for High Performance Computing (HPCVirt 2010) Conference held April 13, 2010 in Paris, France.

Country of Publication:: United States

Language:: English

Similar Records

Peer-to-peer architectures for exascale computing : LDRD final report.

Technical Report · Wed Sep 01 00:00:00 EDT 2010 · OSTI ID:984168

Vorobeychik, Yevgeniy; Mayo, Jackson R; Minnich, Ronald G; +2 more

Review of Enabling Technologies to Facilitate Secure Compute Customization

Technical Report · Mon Dec 01 00:00:00 EST 2014 · OSTI ID:984168

Aderholdt, Ferrol; Caldwell, Blake A; Hicks, Susan Elaine; +6 more

A case for Virtual Machine based Fault Injection in a High-Performance Computing Environment

Conference · Sat Jan 01 00:00:00 EST 2011 · OSTI ID:984168

Vallee, Geoffroy R; Engelmann, Christian; Scott, Stephen L

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
CONFIGURATION
MONITORING
PERFORMANCE
PLANNING
PROGRAMMING
SANDIA NATIONAL LABORATORIES
SECURITY
COMPUTERS

Title: Ten million and one penguins, or, lessons learned from booting millions of virtual machines on HPC systems.

Citation Formats

Similar Records

Related Subjects