skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Measuring the Interestingness of Articles in a Limited User Environment Prospectus

Abstract

Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. I present a general framework for defining and measuring the ''interestingness'' of articles, called iScore, incorporating user-feedback including tracking multiple topics of interest as well as finding interesting entities or phrases in a complex relationship network. I propose and have shown the validity of the following: 1. Filtering based on only topic relevancy is insufficient for identifying interesting articles. 2. No single feature can characterize the interestingness of an article for a user. It is the combination of multiple features that yields higher quality results. For each user, these features have different degrees of usefulness for predicting interestingness. 3. Through user-feedback, amore » classifier can combine features to predict interestingness for the user. 4. Current evaluation corpora, such as TREC, do not capture all aspects of personalized news filtering systems necessary for system evaluation. 5. Focusing on only specific evolving user interests instead of all topics allows for more efficient resource utilization while yielding high quality recommendation results. 6. Multiple profile vectors yield significantly better results than traditional methods, such as the Rocchio algorithm, for identifying interesting articles. Additionally, the addition of tracking multiple topics as a new feature in iScore, can improve iScore's classification performance. 7. Multiple topic tracking yields better results than the best results from the last TREC adaptive filtering run. As future work, I will address the following hypothesis: Entities and the relationship among these entities using current information extraction technology can be utilized to identify entities of interest and relationships of interest, using a scheme such as PageRank. And I will address one of the following two hypotheses: 1. By addressing the multiple reading roles that a single user may have, classification results can be improved. 2. By tailoring the operating parameters of MTT, better classification results can be achieved.« less

Authors:
 [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
908108
Report Number(s):
UCRL-TH-230629
TRN: US200722%%434
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Thesis/Dissertation
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; CLASSIFICATION; EVALUATION; FOCUSING; HYPOTHESIS; PERFORMANCE; RECOMMENDATIONS; VECTORS; ALGORITHMS

Citation Formats

Pon, Raymond K. Measuring the Interestingness of Articles in a Limited User Environment Prospectus. United States: N. p., 2007. Web. doi:10.2172/908108.
Pon, Raymond K. Measuring the Interestingness of Articles in a Limited User Environment Prospectus. United States. doi:10.2172/908108.
Pon, Raymond K. Wed . "Measuring the Interestingness of Articles in a Limited User Environment Prospectus". United States. doi:10.2172/908108. https://www.osti.gov/servlets/purl/908108.
@article{osti_908108,
title = {Measuring the Interestingness of Articles in a Limited User Environment Prospectus},
author = {Pon, Raymond K.},
abstractNote = {Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. I present a general framework for defining and measuring the ''interestingness'' of articles, called iScore, incorporating user-feedback including tracking multiple topics of interest as well as finding interesting entities or phrases in a complex relationship network. I propose and have shown the validity of the following: 1. Filtering based on only topic relevancy is insufficient for identifying interesting articles. 2. No single feature can characterize the interestingness of an article for a user. It is the combination of multiple features that yields higher quality results. For each user, these features have different degrees of usefulness for predicting interestingness. 3. Through user-feedback, a classifier can combine features to predict interestingness for the user. 4. Current evaluation corpora, such as TREC, do not capture all aspects of personalized news filtering systems necessary for system evaluation. 5. Focusing on only specific evolving user interests instead of all topics allows for more efficient resource utilization while yielding high quality recommendation results. 6. Multiple profile vectors yield significantly better results than traditional methods, such as the Rocchio algorithm, for identifying interesting articles. Additionally, the addition of tracking multiple topics as a new feature in iScore, can improve iScore's classification performance. 7. Multiple topic tracking yields better results than the best results from the last TREC adaptive filtering run. As future work, I will address the following hypothesis: Entities and the relationship among these entities using current information extraction technology can be utilized to identify entities of interest and relationships of interest, using a scheme such as PageRank. And I will address one of the following two hypotheses: 1. By addressing the multiple reading roles that a single user may have, classification results can be improved. 2. By tailoring the operating parameters of MTT, better classification results can be achieved.},
doi = {10.2172/908108},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Apr 18 00:00:00 EDT 2007},
month = {Wed Apr 18 00:00:00 EDT 2007}
}

Thesis/Dissertation:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this thesis or dissertation.

Save / Share:
  • Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which varies from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment, there are not enough users that would make collaborative filtering effective. A general framework,more » called iScore, is presented for defining and measuring the 'interestingness' of articles, incorporating user-feedback. iScore addresses various aspects of what makes an article interesting, such as topic relevancy, uniqueness, freshness, source reputation, and writing style. It employs various methods to measure these features and uses a classifier operating on these features to recommend articles. The basic iScore configuration is shown to improve recommendation results by as much as 20%. In addition to the basic iScore features, additional features are presented to address the deficiencies of existing feature extractors, such as one that tracks multiple topics, called MTT, and a version of the Rocchio algorithm that learns its parameters online as it processes documents, called eRocchio. The inclusion of both MTT and eRocchio into iScore is shown to improve iScore recommendation results by as much as 3.1% and 5.6%, respectively. Additionally, in TREC11 Adaptive Filter Task, eRocchio is shown to be 10% better than the best filter in the last run of the task. In addition to these two major topic relevancy measures, other features are also introduced that employ language models, phrases, clustering, and changes in topics to improve recommendation results. These additional features are shown to improve recommendation results by iScore by up to 14%. Due to varying reasons that users hold regarding why an article is interesting, an online feature selection method in naive Bayes is also introduced. Online feature selection can improve recommendation results in iScore by up to 18.9%. In summary, iScore in its best configuration can outperform traditional IR techniques by as much as 50.7%. iScore and its components are evaluated in the news recommendation task using three datasets from Yahoo! News, actual users, and Digg. iScore and its components are also evaluated in the TREC Adaptive Filter task using the Reuters RCV1 corpus.« less
  • Search engines, such as Google, assign scores to news articles based on their relevance to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevance scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment, there are not enough users that would make collaborative filtering effective. A generalmore » framework, called iScore, is presented for defining and measuring the ‘‘interestingness of articles, incorporating user-feedback. iScore addresses the various aspects of what makes an article interesting, such as topic relevance, uniqueness, freshness, source reputation, and writing style. It employs various methods, such as multiple topic tracking, online parameter selection, language models, clustering, sentiment analysis, and phrase extraction to measure these features. Due to varying reasons that users hold about why an article is interesting, an online feature selection method in naι¨ve Bayes is also used to improve recommendation results. iScore can outperform traditional IR techniques by as much as 50.7%. iScore and its components are evaluated in the news recommendation task using three datasets from Yahoo! News, actual users, and Digg.« less
  • In order to extract and use a natural resource (e.g., coal) the environment (air, water, etc.) must also be used as a repository of the discharged wastes (e.g., sulfur oxides, nitrous oxides, particulates, etc.). Moreover, if there is a mandated level of the environmental resource (e.g., clean air) that has to be maintained, then certain additional costs must be borne by society (firms utilizing the resource). Thus, in evaluating the scarcity of an extractible resource, the relative position of the environmental resource also must be evaluated. This study incorporated such jointness in the evaluation of the measure of resource scarcity,more » something earlier studies did not address. The theoretical model was developed in an optimal-control framework. It was analytically shown that this new measure of resource scarcity would indicate a different trend compared to earlier ones. The measure of resource scarcity developed in this study captures previous measures as special cases. In an uncertain world, when the impacts of use of an extractible resource on the environment is not known, the stock size of the environmental resource becomes uncertain.« less
  • This thesis addresses itself to the task of designing and analyzing parallel algorithms when the resources of processors, communication, and time are limited. The two parts of this thesis deal with multiprocessor systems and VLSI - the two important parallel processing environments that are prevalent today. In the first part a time-processor-communication tradeoff analysis is conducted for two kinds of problems - N input, 1 output, and N input, N output computations. In the class of problems of the second kind, the problem of prefix computation, an important problem due to the number of naturally occurring computations it can model,more » is studied. Finally, a general methodology is given for design of parallel algorithms that can be used to optimize a given design to a wide set of architectural variations. The second part of the thesis considers the design of parallel algorithms for the VLSI model of computation when the resource of time is severely restricted.« less
  • This research analyzed the impact of self-help changes made by limited resource households (low-income, aged, isolated, and handicapped) in their efforts to reduce their household energy use and increase the comfort of their households. The research used a benefit-cost model to assess the impact that self-help efforts had on the energy use of private households and further the potential impact on society in general. It was assumed that an aggregate of conditions including an educational project contributed to the households' decisions to adopt energy conservation practices. The analysis of variance indicated significant difference at the .0001 level. This finding indicatedmore » that significant difference existed between the number of energy conservation practices present at each of the data collection points. Energy conservation practices adopted by limited resource households were analyzed in a benefit-cost formula to assess the social return from self-help approaches for coping with energy problems. Net present value calculations yielded positive net present values. In addition, benefit-cost ratios were greater than one. These findings led to the conclusion that society, participant households, and the agency funding the energy education project accrued significant economic benefits. These findings demonstrate the potential economic benefit of energy education for limited resource households.« less