skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Measuring the Interestingness of Articles in a Limited User Environment Prospectus

Abstract

Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. I present a general framework for defining and measuring the ''interestingness'' of articles, called iScore, incorporating user-feedback including tracking multiple topics of interest as well as finding interesting entities or phrases in a complex relationship network. I propose and have shown the validity of the following: 1. Filtering based on only topic relevancy is insufficient for identifying interesting articles. 2. No single feature can characterize the interestingness of an article for a user. It is the combination of multiple features that yields higher quality results. For each user, these features have different degrees of usefulness for predicting interestingness. 3. Through user-feedback, amore » classifier can combine features to predict interestingness for the user. 4. Current evaluation corpora, such as TREC, do not capture all aspects of personalized news filtering systems necessary for system evaluation. 5. Focusing on only specific evolving user interests instead of all topics allows for more efficient resource utilization while yielding high quality recommendation results. 6. Multiple profile vectors yield significantly better results than traditional methods, such as the Rocchio algorithm, for identifying interesting articles. Additionally, the addition of tracking multiple topics as a new feature in iScore, can improve iScore's classification performance. 7. Multiple topic tracking yields better results than the best results from the last TREC adaptive filtering run. As future work, I will address the following hypothesis: Entities and the relationship among these entities using current information extraction technology can be utilized to identify entities of interest and relationships of interest, using a scheme such as PageRank. And I will address one of the following two hypotheses: 1. By addressing the multiple reading roles that a single user may have, classification results can be improved. 2. By tailoring the operating parameters of MTT, better classification results can be achieved.« less

Authors:
 [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
908108
Report Number(s):
UCRL-TH-230629
TRN: US200722%%434
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Thesis/Dissertation
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; CLASSIFICATION; EVALUATION; FOCUSING; HYPOTHESIS; PERFORMANCE; RECOMMENDATIONS; VECTORS; ALGORITHMS

Citation Formats

Pon, Raymond K. Measuring the Interestingness of Articles in a Limited User Environment Prospectus. United States: N. p., 2007. Web. doi:10.2172/908108.
Pon, Raymond K. Measuring the Interestingness of Articles in a Limited User Environment Prospectus. United States. doi:10.2172/908108.
Pon, Raymond K. Wed . "Measuring the Interestingness of Articles in a Limited User Environment Prospectus". United States. doi:10.2172/908108. https://www.osti.gov/servlets/purl/908108.
@article{osti_908108,
title = {Measuring the Interestingness of Articles in a Limited User Environment Prospectus},
author = {Pon, Raymond K.},
abstractNote = {Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. I present a general framework for defining and measuring the ''interestingness'' of articles, called iScore, incorporating user-feedback including tracking multiple topics of interest as well as finding interesting entities or phrases in a complex relationship network. I propose and have shown the validity of the following: 1. Filtering based on only topic relevancy is insufficient for identifying interesting articles. 2. No single feature can characterize the interestingness of an article for a user. It is the combination of multiple features that yields higher quality results. For each user, these features have different degrees of usefulness for predicting interestingness. 3. Through user-feedback, a classifier can combine features to predict interestingness for the user. 4. Current evaluation corpora, such as TREC, do not capture all aspects of personalized news filtering systems necessary for system evaluation. 5. Focusing on only specific evolving user interests instead of all topics allows for more efficient resource utilization while yielding high quality recommendation results. 6. Multiple profile vectors yield significantly better results than traditional methods, such as the Rocchio algorithm, for identifying interesting articles. Additionally, the addition of tracking multiple topics as a new feature in iScore, can improve iScore's classification performance. 7. Multiple topic tracking yields better results than the best results from the last TREC adaptive filtering run. As future work, I will address the following hypothesis: Entities and the relationship among these entities using current information extraction technology can be utilized to identify entities of interest and relationships of interest, using a scheme such as PageRank. And I will address one of the following two hypotheses: 1. By addressing the multiple reading roles that a single user may have, classification results can be improved. 2. By tailoring the operating parameters of MTT, better classification results can be achieved.},
doi = {10.2172/908108},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Apr 18 00:00:00 EDT 2007},
month = {Wed Apr 18 00:00:00 EDT 2007}
}

Thesis/Dissertation:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this thesis or dissertation.

Save / Share: