Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
A Fractal Approach for Selecting an Appropriate Bin Size for Cell-Based Diversity Dimitris K. Agrafiotis* and Dmitrii N. Rassokhin
 

Summary: A Fractal Approach for Selecting an Appropriate Bin Size for Cell-Based Diversity
Estimation
Dimitris K. Agrafiotis* and Dmitrii N. Rassokhin
3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341
Received September 3, 2001
A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The
method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting
algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the
diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits
a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number
of points considered, and the dimensionality of the feature space. The peak of this distribution represents
the optimal bin size for a given data set and sample size. Although box counting can be performed in an
algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different
spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.
I. INTRODUCTION
Molecular diversity continues to attract significant interest
in the combinatorial chemistry and high-throughput communi-
ties.1-4 Despite an increase in the sophistication and involve-
ment of diversity profiling techniques in library design and
compound acquisition, the concept has been difficult to

  

Source: Agrafiotis, Dimitris K. - Molecular Design and Informatics Group, Johnson & Johnson Pharmaceutical Research and Development

 

Collections: Chemistry; Computer Technologies and Information Sciences