Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
The Normalized Compression Distance Is Resistant to Noise
 

Summary: 1895
The Normalized Compression Distance
Is Resistant to Noise
Manuel Cebrián, Manuel Alfonseca, Associate Member, IEEE, and
Alfonso Ortega
Abstract--This correspondence studies the influence of noise on the nor-
malized compression distance (NCD), a measure based on the use of com-
pressors to compute the degree of similarity of two files. This influence is ap-
proximated by a first order differential equation which gives rise to a com-
plex effect, which explains the fact that the NCD may give values greater
than 1, observed by other authors. The model is tested experimentally with
good adjustment. Finally, the influence of noise on the clustering of files of
different types is explored, finding that the NCD performs well even in the
presence of quite high noise levels.
Index Terms--Clustering and noise resistance, datafile corruption, het-
erogeneous data analysis, Kolmogorov complexity, noisy channel, normal-
ized compression distance, universal similarity distance.
I. INTRODUCTION
Universal metrics are applicable to any kind of data and are one of
the main objectives of clustering theory. The normalized information

  

Source: Alfonseca, Manuel - Escuela Politécnica Superior, Universidad Autonoma de Madrid

 

Collections: Computer Technologies and Information Sciences; Engineering