Automatic generation of stop word lists for information retrieval and analysis
Patent
·
OSTI ID:1082869
Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC0576RL01830
- Assignee:
- Battelle Memorial Institute (Richland, WA)
- Patent Number(s):
- 8,352,469
- Application Number:
- 12/555,962
- OSTI ID:
- 1082869
- Country of Publication:
- United States
- Language:
- English
Similar Records
Experiments in automatic word class and word sense identification for information retrieval
Rapid automatic keyword extraction for information retrieval and analysis
Automatic Keyword Extraction from Individual Documents
Technical Report
·
Sat Dec 31 00:00:00 EST 1994
·
OSTI ID:1082869
Rapid automatic keyword extraction for information retrieval and analysis
Patent
·
Tue Mar 06 00:00:00 EST 2012
·
OSTI ID:1082869
+2 more
Automatic Keyword Extraction from Individual Documents
Book
·
Mon May 03 00:00:00 EDT 2010
·
OSTI ID:1082869
+1 more