Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Automatically Categorizing Written Texts by Author Gender* Moshe Koppel1
 

Summary: Automatically Categorizing Written Texts by Author Gender*
Moshe Koppel1
Shlomo Argamon2,1
Anat Rachel Shimoni1
1
Dept. of Computer Science, Bar-Ilan University
Ramat Gan 52900, Israel
2
Dept. of Computer Science, Jerusalem College of Technology
21 Havaad Haleumi St. Jerusalem 91102, Israel
Abstract
The problem of automatically determining the gender of a document's author would appear to be a more subtle
problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated
text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender
of the author of an unseen formal written document with approximately 80% accuracy. The same techniques can
be used to determine if a document is fiction or non-fiction with approximately 98% accuracy.
1. Introduction
1.1 Text Categorization
The last ten years has seen an explosion of research in automated text categorization (Sebastiani 2002).
In the text categorization problem, we are given a set of two or more categories and examples of texts

  

Source: Argamon, Shlomo - Department of Computer Science, Illinois Institute of Technology

 

Collections: Computer Technologies and Information Sciences