 
Summary: A Geodesic Framework for Analyzing Molecular Similarities
Dimitris K. Agrafiotis* and Huafeng Xu
3Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341
Received October 31, 2002
A fast selforganizing algorithm for extracting the minimum number of independent variables that can fully
describe a set of observations was recently described (Agrafiotis, D. K.; Xu, H. Proc. Natl. Acad. Sci.
U.S.A. 2002, 99, 1586915872). The method, called stochastic proximity embedding (SPE), attempts to
generate lowdimensional Euclidean maps that best preserve the similarities between a set of related objects.
Unlike conventional multidimensional scaling (MDS) and nonlinear mapping (NLM), SPE preserves only
local relationships and, by doing so, reveals the intrinsic dimensionality and metric structure of the data. Its
success depends critically on the choice of the neighborhood radius, which should be consistent with the
local curvature of the underlying manifold. Here, we describe a procedure for determining that radius by
examining the tradeoff between the stress function and the number of connected components in the
neighborhood graph and show that it can be used to produce meaningful maps in any embedding dimension.
The power of the algorithm is illustrated in two major areas of computational drug design: conformational
analysis and diversity profiling of large chemical libraries.
I. INTRODUCTION
Virtually all marketed drugs result from the optimization
of a lead compound identified through random screening or
serendipitous observation of a pharmaceutically relevant side
