Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
An Efficient Implementation of Distance-Based Diversity Measures Based on k-d Trees Dimitris K. Agrafiotis* and Victor S. Lobanov
 

Summary: An Efficient Implementation of Distance-Based Diversity Measures Based on k-d Trees
Dimitris K. Agrafiotis* and Victor S. Lobanov
3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Suite 104, Exton, Pennsylvania 19341
Received June 9, 1998
The problem of quantifying molecular diversity continues to attract significant interest among computational
chemists. Most algorithms reported to date are distance-based and scale to the square of the size of the data
set. This paper reports an alternative algorithm based on k-dimensional (or k-d) trees. k-d trees are
combinatorial data structures that allow expedient location of nearest neighbors in multivariate spaces. Nearest
neighbor detection forms the basis of many popular diversity measures, such as maximin, minimum spanning
trees, and many others. In this report, we demonstrate that k-d trees exhibit excellent scaling characteristics
and can be used to accelerate diversity estimation without compromising the quality of the design. The
advantages of this approach are contrasted with an alternative algorithm that was recently proposed by
Turner et al. based on the cosine similarity coefficient.
INTRODUCTION
In recent years, advances in synthetic and screening
technology have enabled the simultaneous synthesis and
biological evaluation of large chemical libraries containing
hundreds to tens of thousands of compounds. Molecular
diversity continues to be the main guiding principle in the
design of combinatorial and high-throughput screening

  

Source: Agrafiotis, Dimitris K. - Molecular Design and Informatics Group, Johnson & Johnson Pharmaceutical Research and Development

 

Collections: Chemistry; Computer Technologies and Information Sciences