Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Beyond the Hype: An Evaluation of Commercially Available Machine-Learning-Based Malware Detectors

Journal Article · · Digital Threats: Research and Practice
DOI:https://doi.org/10.1145/3567432· OSTI ID:1965262

There is a lack of scientific testing of commercially available malware detectors, especially those that boast accurate classification of never-before-seen (i.e., zero-day) files using machine learning (ML). Consequently, efficacy of malware detectors is opaque, inhibiting end users from making informed decisions and researchers from targeting gaps in current detectors. In this paper, we present a scientific evaluation of four prominent commercial malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously and never-before-seen files? Is purchasing a network-level malware detector worth the cost? To investigate, we tested each tool against 3,536 total files (2,554 or 72% malicious, 982 or 28% benign) of a variety of file types, including hundreds of malicious zero-days, polyglots, and APT-style files, delivered on multiple protocols. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of the recent cost-benefit evaluation procedure of Iannacone & Bridges. Although the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool might still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files—37% of malware, including all polyglot files, were undetected. Priorities for researchers and takeaways for end users are given. Code for future use of the cost model is provided.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1965262
Journal Information:
Digital Threats: Research and Practice, Journal Name: Digital Threats: Research and Practice Journal Issue: n/a Vol. n/a; ISSN 2692-1626
Publisher:
Association for Computing Machinery (ACM)Copyright Statement
Country of Publication:
United States
Language:
English

References (30)

Machine Learning Aided Static Malware Analysis: A Survey and Tutorial book January 2018
Instrumenting Competition-Based Exercises to Evaluate Cyber Defender Situation Awareness book January 2013
Return on security investment – proving it's worth it journal November 2005
ISRAM: information security risk analysis method journal March 2005
A concise cost analysis of Internet malware journal October 2009
An empirical comparison of botnet detection methods journal September 2014
Detection of malicious PDF files and directions for enhancements: A state-of-the art survey journal February 2015
Quantifiable & comparable evaluations of cyber defensive capabilities: A survey & novel, unified approach journal September 2020
Adversarial attacks against Windows PE malware detection: A survey of the state-of-the-art journal May 2023
Performance comparison of intrusion detection systems and application of machine learning to Snort system journal March 2018
Mobile malware attacks: Review, taxonomy & future directions journal August 2019
The rise of machine learning for detection and classification of malware: Research developments, trends and challenges journal March 2020
A Comprehensive Review on Malware Detection Approaches journal January 2020
LARIAT: Lincoln adaptable real-time information assurance testbed conference January 2002
Investigation of Possibilities to Detect Malware Using Existing Tools conference October 2017
Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus conference October 2018
National Cyber Range Overview conference October 2014
How the Cyber Defense Exercise Shaped an Information-Assurance Curriculum journal September 2007
Evaluation studies of three intrusion detection systems under various attacks and rule sets conference October 2013
An Assessment of the Usability of Machine Learning Based Tools for the Security Operations Center conference November 2020
Testing malware detectors journal July 2004
Hit 'em where it hurts conference December 2011
Computing legacy software behavior to understand functionality and security properties conference January 2013
Polyglots conference January 2013
Large-Scale Identification of Malicious Singleton Files conference March 2017
A Survey on Malware Detection Using Data Mining Techniques journal June 2017
Dynamic Malware Analysis in the Modern Era—A State of the Art Survey journal September 2020
Security attribute evaluation method conference January 2002
A state-of-the-art survey of malware detection approaches using data mining techniques journal January 2018
Extract Me If You Can: Abusing PDF Parsers in Malware Detectors conference January 2016

Similar Records

AI ATAC 1: An Evaluation of Prominent Commercial Malware Detectors
Conference · Thu Nov 30 23:00:00 EST 2023 · OSTI ID:2301624

Toward the Detection of Polyglot Files
Conference · Mon Aug 08 00:00:00 EDT 2022 · OSTI ID:1885926

Toward the Detection of Polyglot Files
Conference · Mon Aug 01 00:00:00 EDT 2022 · OSTI ID:3002965