Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness
The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 929189
- Report Number(s):
- UCRL-CONF-235295; TRN: US200815%%229
- Resource Relation:
- Conference: Presented at: SIAM Conference on Data Mining 2008, Atlanta, GA, United States, Apr 24 - Apr 26, 2008
- Country of Publication:
- United States
- Language:
- English
Similar Records
Tracking Multiple Topics for Finding Interesting Articles
Measuring the Interestingness of Articles in a Limited User Environment