Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
On the Development of Name Search Techniques for Arabic Syed Uzair Aqeel, Steve Beitzel, Eric Jensen, Ophir Frieder, David Grossman
 

Summary: On the Development of Name Search Techniques for Arabic
Syed Uzair Aqeel, Steve Beitzel, Eric Jensen, Ophir Frieder, David Grossman
Illinois Institute of Technology, 10 W. 31st
Street, Chicago, IL 60616
{aqeel, beitzel, jensen, frieder, grossman}@ir.iit.edu
The need for effective identity matching systems has led to extensive research in the area of name
search. For the most part, such work has been limited to English and other Latin-based languages.
Consequently, algorithms such as Soundex and n-gram matching are of limited utility for languages such
as Arabic, which has a vastly different morphology that relies heavily on phonetic information. The dearth
of work in this field is partly due to the lack of standardized test data. Consequently, we built a collection
of 7,939 Arabic names, along with 50 training queries and 111 test queries. We use this collection to
evaluate a variety of algorithms, including a derivative of Soundex tailored to Arabic (ASOUNDEX),
measuring effectiveness using standard information retrieval measures. Our results show an
improvement of 70% over existing approaches.
Introduction
Identity matching systems frequently employ name search algorithms to effectively locate relevant
information about a given person. Such systems are used for applications as diverse as tax fraud
detection and immigration control. Using names to retrieve information makes such systems susceptible
to problems arising from typographical errors. That is, exact match search approaches will not find
instances of misspelled names or those names that have more than one accepted spelling. An example of

  

Source: Argamon, Shlomo - Department of Computer Science, Illinois Institute of Technology

 

Collections: Computer Technologies and Information Sciences