Method and system of filtering and recommending documents
Abstract
Disclosed is a method and system for discovering documents using a computer and providing a small set of the most relevant documents to the attention of a human observer. Using the method, the computer obtains a seed document from the user and generates a seed document vector using term frequency-inverse corpus frequency weighting. A keyword index for a plurality of source documents can be compared with the weighted terms of the seed document vector. The comparison is then filtered to reduce the number of documents, which define an initial subset of the source documents. Initial subset vectors are generated and compared to the seed document vector to obtain a similarity value for each comparison. Based on the similarity value, the method then recommends one or more of the source documents.
- Inventors:
- Issue Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1237854
- Patent Number(s):
- 9256649
- Application Number:
- 13/920,803
- Assignee:
- UT-Battelle LLC (Oak Ridge, TN)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC05-00OR22725
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2013 Jun 18
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS
Citation Formats
Patton, Robert M., and Potok, Thomas E. Method and system of filtering and recommending documents. United States: N. p., 2016.
Web.
Patton, Robert M., & Potok, Thomas E. Method and system of filtering and recommending documents. United States.
Patton, Robert M., and Potok, Thomas E. Tue .
"Method and system of filtering and recommending documents". United States. https://www.osti.gov/servlets/purl/1237854.
@article{osti_1237854,
title = {Method and system of filtering and recommending documents},
author = {Patton, Robert M. and Potok, Thomas E.},
abstractNote = {Disclosed is a method and system for discovering documents using a computer and providing a small set of the most relevant documents to the attention of a human observer. Using the method, the computer obtains a seed document from the user and generates a seed document vector using term frequency-inverse corpus frequency weighting. A keyword index for a plurality of source documents can be compared with the weighted terms of the seed document vector. The comparison is then filtered to reduce the number of documents, which define an initial subset of the source documents. Initial subset vectors are generated and compared to the seed document vector to obtain a similarity value for each comparison. Based on the similarity value, the method then recommends one or more of the source documents.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {2}
}
Works referenced in this record:
TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams
conference, December 2006
- Reed, Joel; Jiao, Yu; Potok, Thomas
- 2006 5th International Conference on Machine Learning and Applications (ICMLA'06)
An implementation of a knowledge recommendation system based on similarity among users' profiles
conference, January 2002
- Nakagawa, A.; Ito, T.
- SICE 2002. 41st SICE Annual Conference, Proceedings of the 41st SICE Annual Conference. SICE 2002.
A vector space model for automatic indexing
journal, November 1975
- Salton, G.; Wong, A.; Yang, C. S.
- Communications of the ACM, Vol. 18, Issue 11
Engene: A genetic algorithm classifier for content-based recommender systems that does not require continuous user feedback
conference, September 2010
- Pagonis, John; Clark, Adrian F.
- 2010 UK Workshop on Computational Intelligence (UKCI)
A statistical interpretation of term specificity and its application in retrieval
journal, October 2004
- Spärck Jones, Karen
- Journal of Documentation, Vol. 60, Issue 5
Method and system for optimally searching a document database using a representative semantic space
patent, January 2009
- Sommer, Matthew S.; Thompson, Kevin B.
- US Patent Document 7,483,892
Classification of clustered documents based on similarity scores
patent, September 2013
- Buryak, Kirill; Peng, Jun; Lewis, Glenn M.
- US Patent Document 8,543,576