Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Chat Mining for Gender Prediction Tayfun Kucukyilmaz, B. Barla Cambazoglu,

Summary: Chat Mining for Gender Prediction
Tayfun Kucukyilmaz, B. Barla Cambazoglu,
Cevdet Aykanat, and Fazli Can
Bilkent University, Department of Computer Engineering,
06800 Bilkent, Ankara, Turkey
{ktayfun, berkant, aykanat, canf}@cs.bilkent.edu.tr
Abstract. The aim of this paper is to investigate the feasibility of pre-
dicting the gender of a text document's author using linguistic evidence.
For this purpose, term- and style-based classification techniques are eval-
uated over a large collection of chat messages. Prediction accuracies up
to 84.2% are achieved, illustrating the applicability of these techniques
to gender prediction. Moreover, the reverse problem is exploited, and the
effect of gender on the writing style is discussed.
1 Introduction
Authorship characterization is a problem long studied in literature [1]. In general
terms, authorship characterization can be defined as the problem of predicting
the attributes (e.g., biological properties and socio-cultural status) of the au-
thor of a textual document. The outcome of such studies are primarily used for
financial forensics, law enforcement, threat analysis, and prevention of terror-
ist activities. Consequently, in several studies [2,3], efforts have been spent to


Source: Aykanat, Cevdet - Department of Computer Engineering, Bilkent University


Collections: Computer Technologies and Information Sciences