A Survey of Dimension Reduction Techniques
Advances in data collection and storage capabilities during the past decades have led to an information overload in most sciences. Researchers working in domains as diverse as engineering, astronomy, biology, remote sensing, economics, and consumer transactions, face larger and larger observations and simulations on a daily basis. Such datasets, in contrast with smaller, more traditional datasets that have been studied extensively in the past, present new challenges in data analysis. Traditional statistical methods break down partly because of the increase in the number of observations, but mostly because of the increase in the number of variables associated with each observation. The dimension of the data, is the number of variables that are measured on each observation. High-dimensional datasets present many mathematical challenges as well as some opportunities, and are bound to give rise to new theoretical developments. One of the problems with high-dimensional datasets is that, in many cases, not all the measured variables are ''important'' for understanding the underlying phenomena of interest. While certain computationally expensive novel methods can construct predictive models with high accuracy from high-dimensional data, it is still of interest in many applications to reduce the dimension of the original data prior to any modeling of the data. In this paper, we described several dimension reduction methods.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- US Department of Energy (US)
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 15002155
- Report Number(s):
- UCRL-ID-148494; TRN: US200408%%150
- Resource Relation:
- Other Information: PBD: 9 May 2002
- Country of Publication:
- United States
- Language:
- English
Similar Records
Breaking the High-Throughput Bottleneck: New tools help biologists integrate complex datasets
Effective Dimension Reduction Using Sequential Projection Pursuit On Gene Expression Data for Cancer Classification