skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata

Abstract

Techniques such as Iterative Proportional Fitting have been previously suggested as a means to generate new data with the demographic granularity of individual surveys and the spatial granularity of small area tabulations of censuses and surveys. This article explores internal and external validation approaches for synthetic, small area, household- and individual-level microdata using a case study for Bangladesh. Using data from the Bangladesh Census 2011 and the Demographic and Health Survey, we produce estimates of infant mortality rate and other household attributes for small areas using a variation of an iterative proportional fitting method called P-MEDM. We conduct an internal validation to determine: whether the model accurately recreates the spatial variation of the input data, how each of the variables performed overall, and how the estimates compare to the published population totals. We conduct an external validation by comparing the estimates with indicators from the 2009 Multiple Indicator Cluster Survey (MICS) for Bangladesh to benchmark how well the estimates compared to a known dataset which was not used in the original model. The results indicate that the estimation process is viable for regions that are better represented in the microdata sample, but also revealed the possibility of strong overfitting inmore » sparsely sampled sub-populations.« less

Authors:
 [1];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computational Sciences and Engineering Division
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computational Sciences and Engineering Division; Univ. of Tennessee, Knoxville, TN (United States). Dept. of Geography
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE; Work for Others (WFO)
OSTI Identifier:
1349598
Grant/Contract Number:
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Computers, Environment and Urban Systems
Additional Journal Information:
Journal Volume: 63; Journal Issue: C; Journal ID: ISSN 0198-9715
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 42 ENGINEERING; population; small area estimation; P-MEDM; IPF; microdata; validation; Reweighting; DHS

Citation Formats

Rose, Amy N., and Nagle, Nicholas N. Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata. United States: N. p., 2016. Web. doi:10.1016/j.compenvurbsys.2016.07.006.
Rose, Amy N., & Nagle, Nicholas N. Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata. United States. doi:10.1016/j.compenvurbsys.2016.07.006.
Rose, Amy N., and Nagle, Nicholas N. 2016. "Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata". United States. doi:10.1016/j.compenvurbsys.2016.07.006. https://www.osti.gov/servlets/purl/1349598.
@article{osti_1349598,
title = {Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata},
author = {Rose, Amy N. and Nagle, Nicholas N.},
abstractNote = {Techniques such as Iterative Proportional Fitting have been previously suggested as a means to generate new data with the demographic granularity of individual surveys and the spatial granularity of small area tabulations of censuses and surveys. This article explores internal and external validation approaches for synthetic, small area, household- and individual-level microdata using a case study for Bangladesh. Using data from the Bangladesh Census 2011 and the Demographic and Health Survey, we produce estimates of infant mortality rate and other household attributes for small areas using a variation of an iterative proportional fitting method called P-MEDM. We conduct an internal validation to determine: whether the model accurately recreates the spatial variation of the input data, how each of the variables performed overall, and how the estimates compare to the published population totals. We conduct an external validation by comparing the estimates with indicators from the 2009 Multiple Indicator Cluster Survey (MICS) for Bangladesh to benchmark how well the estimates compared to a known dataset which was not used in the original model. The results indicate that the estimation process is viable for regions that are better represented in the microdata sample, but also revealed the possibility of strong overfitting in sparsely sampled sub-populations.},
doi = {10.1016/j.compenvurbsys.2016.07.006},
journal = {Computers, Environment and Urban Systems},
number = C,
volume = 63,
place = {United States},
year = 2016,
month = 8
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • Two unresolved issues about airport noise-property value studies are addressed. The first issue concerns the comparability of empirical results from aggregate census data vs individual sales values, and the second issue concerns the homogeneity and stability of results from housing price studies over time and across markets. Hedonic price models from two sets of data for a residential area near the Atlanta International Airport are estimated at two points in time, 1979-1980 and 1970-1972. The available data yield similar estimates of the noise discount over time, and from the prices of individual house sales vs owner-appraised census block aggregates. 26more » references, 3 tables« less
  • There is a significant increase in terrestrial heat flow with depth in the Hinton-Edson area of the deep part of the western Canadian sedimentary basin in Alberta. This is especially true near the Rocky Mountain foothills which is an area of high relief, high hydraulic head and regional water recharge. Gravity-imposed downward movement of meteoric water through the thick sedimentary strata with velocities as low as 10/sup -10/ m/s to 0.5 X 10/sup -9/ m/s may cause an increase of heat flow with depth. Such disturbance of heat flow with depth on a regional scale in the sedimentary strata meansmore » that it is not possible to determine the background conductive steady-state heat flow associated with crustal or upper mantle heat sources in such an area from measurement of conductive heat flow in the part of the sedimentary column where water movement occurs. This is because the convective portion cannot be determined, particularly when measurements are made in only part of the regional hydrodynamic system of the basin.« less
  • A first search is reported for a standard model Higgs boson (H) that is produced through vector boson fusion and decays to a bottom-quark pair. Two data samples, corresponding to integrated luminosities of 19.8 fb -1 and 18.3 fb -1 of proton-proton collisions at √s=8 TeV were selected for this channel at the CERN LHC. The observed significance in these data samples for a H→more » $$\mathrm{b\bar{b}}$$ signal at a mass of 125 GeV is 2.2 standard deviations, while the expected significance is 0.8 standard deviations. The fitted signal strength μ=σ/σ SM=2.8 +1.6 -1.4. The combination of this result with other CMS searches for the Higgs boson decaying to a b-quark pair yields a signal strength of 1.0±0.4, corresponding to a signal significance of 2.6 standard deviations for a Higgs boson mass of 125 GeV.« less
  • A first search is reported for a standard model Higgs boson (H) that is produced through vector boson fusion and decays to a bottom-quark pair. Two data samples, corresponding to integrated luminosities of 19.8 fb -1 and 18.3 fb -1 of proton-proton collisions at √s=8 TeV were selected for this channel at the CERN LHC. The observed significance in these data samples for a H→more » $$\mathrm{b\bar{b}}$$ signal at a mass of 125 GeV is 2.2 standard deviations, while the expected significance is 0.8 standard deviations. The fitted signal strength μ=σ/σ SM=2.8 +1.6 -1.4. The combination of this result with other CMS searches for the Higgs boson decaying to a b-quark pair yields a signal strength of 1.0±0.4, corresponding to a signal significance of 2.6 standard deviations for a Higgs boson mass of 125 GeV.« less
    Cited by 9
  • Young first-year sea ice is nearly as important as open water in modulating heat flux between the ocean and atmosphere in the Arctic. Just after the onset of freeze-up, first-year ice is in the early stages of growth and will consist of young first-year and thin ice. The distribution of sea ice in this thickness range impacts heat transfer in the Arctic. Therefore, improving the estimates of ice concentrations in this thickness range is significant. NASA Team Algorithm (NTA) for passive microwave data inaccurately classifies sea ice during the melt and freeze-up seasons because it misclassifies multiyear ice as first-yearmore » ice. The authors developed a hybrid fusion technique for incorporating multiyear ice information derived form synthetic aperture radar (SAR) images into a passive microwave algorithm to improve ice type concentration estimates. First, they classified SAR images using a dynamic thresholding technique and estimated the multiyear ice concentration. Then they used the SAR-derived multiyear ice concentration constrain the NTA and obtained an improved first-year ice concentration estimate. They computed multiyear and first-year ice concentration estimates over a region in the eastern-central Arctic in which field observations of ice and in situ radar backscatter measurements were performed. With the NTA alone, the first-year ice concentration in the study area varied between 0.11 and 0.40, while the multiyear ice concentration varied form 0.63 to 0.39. With the hybrid fusion technique, the first-year ice concentration varied between 0.08 and 0.23 and the multiyear ice concentration was between 0.62 and 0.66. The fused estimates of first-year and multiyear ice concentration appear to be more accurate than NTA, based on ice observations that were logged aboard the US Coast Guard icebreaker Polar Star in the study area during 1991.« less