skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock

Abstract

Abstract In this study, for five sites around the world, we look at the effects of different model types and variable selection approaches on forest yield modelling performances in an area-based approach (ABA). We compared ordinary least squares regression (OLS), k-nearest neighbours (kNN) and random forest (RF). Our objective was to test if there are systematic differences in accuracy between OLS, kNN and RF in ABA predictions of growing stock volume. The analyses are based on a 5-fold cross-validation at five study sites: an eucalyptus plantation, a temperate forest and three different boreal forests. Two completely independent validation datasets were also available for two of the boreal sites. For the kNN, we evaluated multiple measures of distance including Euclidean, Mahalanobis, most similar neighbour (MSN) and an RF-based distance metric. The variable selection approaches we examined included a heuristic approach (for OLS, kNN and RF), exhaustive search among all combinations (OLS only) and all variables together (RF only). Performances varied by model type and variable selection approaches among sites. OLS and RF had similar accuracies and were more efficient than any of the kNN variants. Variable selection did not affect RF performance. Heuristic and exhaustive variable selection performed similarly for OLS.more » kNN fared the poorest amongst model types, and kNN with RF distance was prone to overfitting when compared with a validation dataset. Additional caution is therefore required when building kNN models for volume prediction though ABA, being preferable instead to opt for models based on OLS with some variable selection, or RF with all variables together.« less

Authors:
 [1];  [2];  [2];  [2];  [3];  [4];  [4];  [1];  [1]
  1. Forest Research Centre, School of Agriculture, University of Lisbon, Tapada da Ajuda, 1349-017 Lisbon, Portugal
  2. School of Forest Sciences, University of Eastern Finland P.O. Box 111, 80101 Joensuu, Finland
  3. USDA Forest Service, Pacific Northwest Research Station, 3625 93rd Ave SW, Olympia, WA 98512, USA
  4. Faculty of Environmental Sciences and Natural Resource Management, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway
Publication Date:
Sponsoring Org.:
USDOE Office of Nuclear Energy (NE), Nuclear Fuel Cycle and Supply Chain
OSTI Identifier:
1670223
Grant/Contract Number:  
PD/BD/128489/2017; UIDB/00239/2020
Resource Type:
Published Article
Journal Name:
Forestry
Additional Journal Information:
Journal Name: Forestry Journal Volume: 94 Journal Issue: 2; Journal ID: ISSN 0015-752X
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Cosenza, Diogo N., Korhonen, Lauri, Maltamo, Matti, Packalen, Petteri, Strunk, Jacob L., Næsset, Erik, Gobakken, Terje, Soares, Paula, and Tomé, Margarida. Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock. United Kingdom: N. p., 2020. Web. https://doi.org/10.1093/forestry/cpaa034.
Cosenza, Diogo N., Korhonen, Lauri, Maltamo, Matti, Packalen, Petteri, Strunk, Jacob L., Næsset, Erik, Gobakken, Terje, Soares, Paula, & Tomé, Margarida. Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock. United Kingdom. https://doi.org/10.1093/forestry/cpaa034
Cosenza, Diogo N., Korhonen, Lauri, Maltamo, Matti, Packalen, Petteri, Strunk, Jacob L., Næsset, Erik, Gobakken, Terje, Soares, Paula, and Tomé, Margarida. Sat . "Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock". United Kingdom. https://doi.org/10.1093/forestry/cpaa034.
@article{osti_1670223,
title = {Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock},
author = {Cosenza, Diogo N. and Korhonen, Lauri and Maltamo, Matti and Packalen, Petteri and Strunk, Jacob L. and Næsset, Erik and Gobakken, Terje and Soares, Paula and Tomé, Margarida},
abstractNote = {Abstract In this study, for five sites around the world, we look at the effects of different model types and variable selection approaches on forest yield modelling performances in an area-based approach (ABA). We compared ordinary least squares regression (OLS), k-nearest neighbours (kNN) and random forest (RF). Our objective was to test if there are systematic differences in accuracy between OLS, kNN and RF in ABA predictions of growing stock volume. The analyses are based on a 5-fold cross-validation at five study sites: an eucalyptus plantation, a temperate forest and three different boreal forests. Two completely independent validation datasets were also available for two of the boreal sites. For the kNN, we evaluated multiple measures of distance including Euclidean, Mahalanobis, most similar neighbour (MSN) and an RF-based distance metric. The variable selection approaches we examined included a heuristic approach (for OLS, kNN and RF), exhaustive search among all combinations (OLS only) and all variables together (RF only). Performances varied by model type and variable selection approaches among sites. OLS and RF had similar accuracies and were more efficient than any of the kNN variants. Variable selection did not affect RF performance. Heuristic and exhaustive variable selection performed similarly for OLS. kNN fared the poorest amongst model types, and kNN with RF distance was prone to overfitting when compared with a validation dataset. Additional caution is therefore required when building kNN models for volume prediction though ABA, being preferable instead to opt for models based on OLS with some variable selection, or RF with all variables together.},
doi = {10.1093/forestry/cpaa034},
journal = {Forestry},
number = 2,
volume = 94,
place = {United Kingdom},
year = {2020},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1093/forestry/cpaa034

Save / Share:

Works referenced in this record:

The Distance-Weighted k-Nearest-Neighbor Rule
journal, April 1976

  • Dudani, Sahibsingh A.
  • IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-6, Issue 4
  • DOI: 10.1109/TSMC.1976.5408784

A meta-analysis and review of the literature on the k-Nearest Neighbors technique for forestry applications that use remotely sensed data
journal, April 2016


A comparison of machine learning regression techniques for LiDAR-derived estimation of forest variables
journal, November 2015


Calibration of nationwide airborne laser scanning based stem volume models
journal, June 2018


ALS-based estimation of plot volume and site index in a eucalyptus plantation with a nonlinear mixed-effect model that accounts for the clone effect
journal, August 2011

  • Packalén, Petteri; Mehtätalo, Lauri; Maltamo, Matti
  • Annals of Forest Science, Vol. 68, Issue 6
  • DOI: 10.1007/s13595-011-0124-9

Multispectral Airborne LiDAR Data in the Prediction of Boreal Tree Species Composition
journal, June 2019

  • Kukkonen, Mikko; Maltamo, Matti; Korhonen, Lauri
  • IEEE Transactions on Geoscience and Remote Sensing, Vol. 57, Issue 6
  • DOI: 10.1109/TGRS.2018.2885057

An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
journal, November 2017

  • Strunk, Jacob; Gould, Peter; Packalen, Petteri
  • Forests, Vol. 8, Issue 11
  • DOI: 10.3390/f8110444

Optimizing the k-Nearest Neighbors technique for estimating forest aboveground biomass using airborne laser scanning data
journal, June 2015

  • McRoberts, Ronald E.; Næsset, Erik; Gobakken, Terje
  • Remote Sensing of Environment, Vol. 163
  • DOI: 10.1016/j.rse.2015.02.026

Nonparametric estimation of stem volume using airborne laser scanning, aerial photography, and stand-register data
journal, February 2006

  • Maltamo, M.; Malinen, J.; Packalén, P.
  • Canadian Journal of Forest Research, Vol. 36, Issue 2
  • DOI: 10.1139/x05-246

Evaluation of most similar neighbour and random forest methods for imputing forest inventory variables using data from target and auxiliary stands
journal, June 2012


yaImpute : An R Package for k NN Imputation
journal, January 2008

  • Crookston, Nicholas L.; Finley, Andrew O.
  • Journal of Statistical Software, Vol. 23, Issue 10
  • DOI: 10.18637/jss.v023.i10

Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data
journal, May 2008

  • Hudak, Andrew T.; Crookston, Nicholas L.; Evans, Jeffrey S.
  • Remote Sensing of Environment, Vol. 112, Issue 5
  • DOI: 10.1016/j.rse.2007.10.009

Important LiDAR metrics for discriminating forest tree species in Central Europe
journal, March 2018


Plot-level Forest Volume Estimation Using Airborne Laser Scanner and TM Data, Comparison of Boosting and Random Forest Tree Regression Algorithms
journal, January 2011


Selection of relevant features and examples in machine learning
journal, December 1997


Indian Hedgehog: A Mechanotransduction Mediator in Condylar Cartilage
journal, May 2004


A performance comparison of machine learning methods to estimate the fast-growing forest plantation yield based on laser scanning metrics
journal, August 2015

  • Görgens, Eric Bastos; Montaghi, Alessandro; Rodriguez, Luiz Carlos Estraviz
  • Computers and Electronics in Agriculture, Vol. 116
  • DOI: 10.1016/j.compag.2015.07.004

Nationwide airborne laser scanning based models for volume, biomass and dominant height in Finland
journal, January 2016

  • Kotivuori, Eetu; Korhonen, Lauri; Packalen, Petteri
  • Silva Fennica, Vol. 50, Issue 4
  • DOI: 10.14214/sf.1567

The k-MSN method for the prediction of species-specific stand attributes using airborne laser scanning and aerial photographs
journal, August 2007


Optimizing nearest neighbour configurations for airborne laser scanning-assisted estimation of forest volume and biomass
journal, September 2016


Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
journal, March 2017

  • Roberts, David R.; Bahn, Volker; Ciuti, Simone
  • Ecography, Vol. 40, Issue 8
  • DOI: 10.1111/ecog.02881

Random forest in remote sensing: A review of applications and future directions
journal, April 2016


Laser-assisted selection of field plots for an area-based forest inventory
journal, January 2013

  • Gobakken, Terje; Korhonen, Lauri; Næsset, Erik
  • Silva Fennica, Vol. 47, Issue 5
  • DOI: 10.14214/sf.943

Demonstrating the transferability of forest inventory attribute models derived using airborne laser scanning data
journal, June 2019

  • Tompalski, Piotr; White, Joanne C.; Coops, Nicholas C.
  • Remote Sensing of Environment, Vol. 227
  • DOI: 10.1016/j.rse.2019.04.006

Predicting tree attributes and quality characteristics of Scots pine using airborne laser scanning data
journal, January 2009

  • Maltamo, Matti; Peuhkurinen, Jussi; Malinen, Jukka
  • Silva Fennica, Vol. 43, Issue 3
  • DOI: 10.14214/sf.203

Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass
journal, November 2014


A review of feature selection techniques in bioinformatics
journal, August 2007


Practical large-scale forest stand inventory using a small-footprint airborne scanning laser
journal, April 2004


Predicting individual tree attributes from airborne laser point clouds based on the random forests technique
journal, January 2011


Forest biomass estimation from airborne LiDAR data using machine learning approaches
journal, October 2012


Optimization by Simulated Annealing
journal, May 1983


Imputation of single-tree attributes using airborne laser scanning-based height, intensity, and alpha shape metrics
journal, June 2010

  • Vauhkonen, Jari; Korpela, Ilkka; Maltamo, Matti
  • Remote Sensing of Environment, Vol. 114, Issue 6
  • DOI: 10.1016/j.rse.2010.01.016

Random Forests
journal, January 2001


Variable selection strategies for nearest neighbor imputation methods used in remote sensing based forest inventory
journal, November 2012

  • Packalén, Petteri; Temesgen, Hailemariam; Maltamo, Matti
  • Canadian Journal of Remote Sensing, Vol. 38, Issue 5
  • DOI: 10.5589/m12-046

Forest Attributes Estimation Using Aerial Laser Scanner and TM Data
journal, November 2013


Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (randomForest)
journal, February 2006

  • Lawrence, Rick L.; Wood, Shana D.; Sheley, Roger L.
  • Remote Sensing of Environment, Vol. 100, Issue 3
  • DOI: 10.1016/j.rse.2005.10.014

Variable selection using random forests
journal, October 2010

  • Genuer, Robin; Poggi, Jean-Michel; Tuleau-Malot, Christine
  • Pattern Recognition Letters, Vol. 31, Issue 14
  • DOI: 10.1016/j.patrec.2010.03.014