skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Labeling for Entity Extraction in Cyber Security

Conference ·
OSTI ID:1143555

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus (~750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
Work for Others (WFO)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1143555
Resource Relation:
Conference: 2014 ASE International Conference on Cyber Security, Stanford, CA, USA, 20140527, 20140331
Country of Publication:
United States
Language:
English