skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Pitfalls in Prediction Modeling for Normal Tissue Toxicity in Radiation Therapy: An Illustration With the Individual Radiation Sensitivity and Mammary Carcinoma Risk Factor Investigation Cohorts

Abstract

Purpose: To identify the main causes underlying the failure of prediction models for radiation therapy toxicity to replicate. Methods and Materials: Data were used from two German cohorts, Individual Radiation Sensitivity (ISE) (n=418) and Mammary Carcinoma Risk Factor Investigation (MARIE) (n=409), of breast cancer patients with similar characteristics and radiation therapy treatments. The toxicity endpoint chosen was telangiectasia. The LASSO (least absolute shrinkage and selection operator) logistic regression method was used to build a predictive model for a dichotomized endpoint (Radiation Therapy Oncology Group/European Organization for the Research and Treatment of Cancer score 0, 1, or ≥2). Internal areas under the receiver operating characteristic curve (inAUCs) were calculated by a naïve approach whereby the training data (ISE) were also used for calculating the AUC. Cross-validation was also applied to calculate the AUC within the same cohort, a second type of inAUC. Internal AUCs from cross-validation were calculated within ISE and MARIE separately. Models trained on one dataset (ISE) were applied to a test dataset (MARIE) and AUCs calculated (exAUCs). Results: Internal AUCs from the naïve approach were generally larger than inAUCs from cross-validation owing to overfitting the training data. Internal AUCs from cross-validation were also generally larger than the exAUCs,more » reflecting heterogeneity in the predictors between cohorts. The best models with largest inAUCs from cross-validation within both cohorts had a number of common predictors: hypertension, normalized total boost, and presence of estrogen receptors. Surprisingly, the effect (coefficient in the prediction model) of hypertension on telangiectasia incidence was positive in ISE and negative in MARIE. Other predictors were also not common between the 2 cohorts, illustrating that overcoming overfitting does not solve the problem of replication failure of prediction models completely. Conclusions: Overfitting and cohort heterogeneity are the 2 main causes of replication failure of prediction models across cohorts. Cross-validation and similar techniques (eg, bootstrapping) cope with overfitting, but the development of validated predictive models for radiation therapy toxicity requires strategies that deal with cohort heterogeneity.« less

Authors:
 [1];  [2];  [1];  [3];  [4];  [5]; ; ;  [6];  [7];  [1]
  1. Department of Basic Medical Sciences, Faculty of Health Sciences, Ghent University, Ghent (Belgium)
  2. (Belgium)
  3. Department of Mathematical Modeling, Statistics, and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent (Belgium)
  4. (Australia)
  5. Department of Data Analysis, Faculty of Psychology and Educational Sciences, Ghent University, Ghent (Belgium)
  6. Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg (Germany)
  7. Translational Radiobiology Group, Institute of Cancer Sciences, Radiotherapy Related Research, Christie Hospital NHS Trust, University of Manchester, Manchester (United Kingdom)
Publication Date:
OSTI Identifier:
22648766
Resource Type:
Journal Article
Resource Relation:
Journal Name: International Journal of Radiation Oncology, Biology and Physics; Journal Volume: 95; Journal Issue: 5; Other Information: Copyright (c) 2016 Elsevier Science B.V., Amsterdam, The Netherlands, All rights reserved.; Country of input: International Atomic Energy Agency (IAEA)
Country of Publication:
United States
Language:
English
Subject:
62 RADIOLOGY AND NUCLEAR MEDICINE; FORECASTING; MAMMARY GLANDS; NEOPLASMS; RADIATION HAZARDS; RADIOSENSITIVITY; RADIOTHERAPY; SIMULATION; TOXICITY; TRAINING

Citation Formats

Mbah, Chamberlain, E-mail: chamberlain.mbah@ugent.be, Department of Mathematical Modeling, Statistics, and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Thierens, Hubert, Thas, Olivier, National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, New South Wales, De Neve, Jan, Chang-Claude, Jenny, Seibold, Petra, Botma, Akke, West, Catharine, and De Ruyck, Kim. Pitfalls in Prediction Modeling for Normal Tissue Toxicity in Radiation Therapy: An Illustration With the Individual Radiation Sensitivity and Mammary Carcinoma Risk Factor Investigation Cohorts. United States: N. p., 2016. Web. doi:10.1016/J.IJROBP.2016.03.034.
Mbah, Chamberlain, E-mail: chamberlain.mbah@ugent.be, Department of Mathematical Modeling, Statistics, and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Thierens, Hubert, Thas, Olivier, National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, New South Wales, De Neve, Jan, Chang-Claude, Jenny, Seibold, Petra, Botma, Akke, West, Catharine, & De Ruyck, Kim. Pitfalls in Prediction Modeling for Normal Tissue Toxicity in Radiation Therapy: An Illustration With the Individual Radiation Sensitivity and Mammary Carcinoma Risk Factor Investigation Cohorts. United States. doi:10.1016/J.IJROBP.2016.03.034.
Mbah, Chamberlain, E-mail: chamberlain.mbah@ugent.be, Department of Mathematical Modeling, Statistics, and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Thierens, Hubert, Thas, Olivier, National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, New South Wales, De Neve, Jan, Chang-Claude, Jenny, Seibold, Petra, Botma, Akke, West, Catharine, and De Ruyck, Kim. Mon . "Pitfalls in Prediction Modeling for Normal Tissue Toxicity in Radiation Therapy: An Illustration With the Individual Radiation Sensitivity and Mammary Carcinoma Risk Factor Investigation Cohorts". United States. doi:10.1016/J.IJROBP.2016.03.034.
@article{osti_22648766,
title = {Pitfalls in Prediction Modeling for Normal Tissue Toxicity in Radiation Therapy: An Illustration With the Individual Radiation Sensitivity and Mammary Carcinoma Risk Factor Investigation Cohorts},
author = {Mbah, Chamberlain, E-mail: chamberlain.mbah@ugent.be and Department of Mathematical Modeling, Statistics, and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent and Thierens, Hubert and Thas, Olivier and National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, New South Wales and De Neve, Jan and Chang-Claude, Jenny and Seibold, Petra and Botma, Akke and West, Catharine and De Ruyck, Kim},
abstractNote = {Purpose: To identify the main causes underlying the failure of prediction models for radiation therapy toxicity to replicate. Methods and Materials: Data were used from two German cohorts, Individual Radiation Sensitivity (ISE) (n=418) and Mammary Carcinoma Risk Factor Investigation (MARIE) (n=409), of breast cancer patients with similar characteristics and radiation therapy treatments. The toxicity endpoint chosen was telangiectasia. The LASSO (least absolute shrinkage and selection operator) logistic regression method was used to build a predictive model for a dichotomized endpoint (Radiation Therapy Oncology Group/European Organization for the Research and Treatment of Cancer score 0, 1, or ≥2). Internal areas under the receiver operating characteristic curve (inAUCs) were calculated by a naïve approach whereby the training data (ISE) were also used for calculating the AUC. Cross-validation was also applied to calculate the AUC within the same cohort, a second type of inAUC. Internal AUCs from cross-validation were calculated within ISE and MARIE separately. Models trained on one dataset (ISE) were applied to a test dataset (MARIE) and AUCs calculated (exAUCs). Results: Internal AUCs from the naïve approach were generally larger than inAUCs from cross-validation owing to overfitting the training data. Internal AUCs from cross-validation were also generally larger than the exAUCs, reflecting heterogeneity in the predictors between cohorts. The best models with largest inAUCs from cross-validation within both cohorts had a number of common predictors: hypertension, normalized total boost, and presence of estrogen receptors. Surprisingly, the effect (coefficient in the prediction model) of hypertension on telangiectasia incidence was positive in ISE and negative in MARIE. Other predictors were also not common between the 2 cohorts, illustrating that overcoming overfitting does not solve the problem of replication failure of prediction models completely. Conclusions: Overfitting and cohort heterogeneity are the 2 main causes of replication failure of prediction models across cohorts. Cross-validation and similar techniques (eg, bootstrapping) cope with overfitting, but the development of validated predictive models for radiation therapy toxicity requires strategies that deal with cohort heterogeneity.},
doi = {10.1016/J.IJROBP.2016.03.034},
journal = {International Journal of Radiation Oncology, Biology and Physics},
number = 5,
volume = 95,
place = {United States},
year = {Mon Aug 01 00:00:00 EDT 2016},
month = {Mon Aug 01 00:00:00 EDT 2016}
}