Automatic Keyword Extraction from Individual Documents
Abstract
This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.
- Authors:
- Publication Date:
- Research Org.:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 978967
- Report Number(s):
- PNNL-SA-67401
400470000; TRN: US201010%%235
- DOE Contract Number:
- AC05-76RL01830
- Resource Type:
- Book
- Resource Relation:
- Related Information: Text Mining: Application and Theory, 1:3-20
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; AUTOMATION; DOCUMENT TYPES; INFORMATION RETRIEVAL; information extraction, information analytics, automatic keyword extraction, digital libraries, text analysis
Citation Formats
Rose, Stuart J, Engel, David W, Cramer, Nicholas O, and Cowley, Wendy E. Automatic Keyword Extraction from Individual Documents. United States: N. p., 2010.
Web.
Rose, Stuart J, Engel, David W, Cramer, Nicholas O, & Cowley, Wendy E. Automatic Keyword Extraction from Individual Documents. United States.
Rose, Stuart J, Engel, David W, Cramer, Nicholas O, and Cowley, Wendy E. 2010.
"Automatic Keyword Extraction from Individual Documents". United States.
@article{osti_978967,
title = {Automatic Keyword Extraction from Individual Documents},
author = {Rose, Stuart J and Engel, David W and Cramer, Nicholas O and Cowley, Wendy E},
abstractNote = {This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.},
doi = {},
url = {https://www.osti.gov/biblio/978967},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon May 03 00:00:00 EDT 2010},
month = {Mon May 03 00:00:00 EDT 2010}
}
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this book.
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.