DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning

Abstract

Transfer learning (TL) techniques can enable effective learning in data scarce domains by allowing one to re-purpose data or scientific knowledge available in relevant source domains for predictive tasks in a target domain of interest. In this Data in Brief article, we present a synthetic dataset for binary classification in the context of Bayesian transfer learning, which can be used for the design and evaluation of TL-based classifiers. For this purpose, we consider numerous combinations of classification settings, based on which we simulate a diverse set of feature-label distributions with varying learning complexity. For each set of model parameters, we provide a pair of target and source datasets that have been jointly sampled from the underlying feature-label distributions in the target and source domains, respectively. For both target and source domains, the data in a given class and domain are normally distributed, where the distributions across domains are related to each other through a joint prior. To ensure the consistency of the classification complexity across the provided datasets, we have controlled the Bayes error such that it is maintained within a range of predefined values that mimic realistic classification scenarios across different relatedness levels. The provided datasets may serve asmore » useful resources for designing and benchmarking transfer learning schemes for binary classification as well as the estimation of classification error.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Brookhaven National Lab. (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
OSTI Identifier:
1861465
Alternate Identifier(s):
OSTI ID: 1872376
Report Number(s):
BNL-223068-2022-JAAM
Journal ID: ISSN 2352-3409; S2352340922003237; 108113; PII: S2352340922003237
Grant/Contract Number:  
SC0019303; SC0012704; 1835690
Resource Type:
Published Article
Journal Name:
Data in Brief
Additional Journal Information:
Journal Name: Data in Brief Journal Volume: 42 Journal Issue: C; Journal ID: ISSN 2352-3409
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Bayesian transfer learning; Binary classification; Classifier design; Error estimation

Citation Formats

Maddouri, Omar, Qian, Xiaoning, Alexander, Francis J., Dougherty, Edward R., and Yoon, Byung-Jun. Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning. United States: N. p., 2022. Web. doi:10.1016/j.dib.2022.108113.
Maddouri, Omar, Qian, Xiaoning, Alexander, Francis J., Dougherty, Edward R., & Yoon, Byung-Jun. Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning. United States. https://doi.org/10.1016/j.dib.2022.108113
Maddouri, Omar, Qian, Xiaoning, Alexander, Francis J., Dougherty, Edward R., and Yoon, Byung-Jun. Wed . "Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning". United States. https://doi.org/10.1016/j.dib.2022.108113.
@article{osti_1861465,
title = {Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning},
author = {Maddouri, Omar and Qian, Xiaoning and Alexander, Francis J. and Dougherty, Edward R. and Yoon, Byung-Jun},
abstractNote = {Transfer learning (TL) techniques can enable effective learning in data scarce domains by allowing one to re-purpose data or scientific knowledge available in relevant source domains for predictive tasks in a target domain of interest. In this Data in Brief article, we present a synthetic dataset for binary classification in the context of Bayesian transfer learning, which can be used for the design and evaluation of TL-based classifiers. For this purpose, we consider numerous combinations of classification settings, based on which we simulate a diverse set of feature-label distributions with varying learning complexity. For each set of model parameters, we provide a pair of target and source datasets that have been jointly sampled from the underlying feature-label distributions in the target and source domains, respectively. For both target and source domains, the data in a given class and domain are normally distributed, where the distributions across domains are related to each other through a joint prior. To ensure the consistency of the classification complexity across the provided datasets, we have controlled the Bayes error such that it is maintained within a range of predefined values that mimic realistic classification scenarios across different relatedness levels. The provided datasets may serve as useful resources for designing and benchmarking transfer learning schemes for binary classification as well as the estimation of classification error.},
doi = {10.1016/j.dib.2022.108113},
journal = {Data in Brief},
number = C,
volume = 42,
place = {United States},
year = {Wed Jun 01 00:00:00 EDT 2022},
month = {Wed Jun 01 00:00:00 EDT 2022}
}

Works referenced in this record:

Robust importance sampling for error estimation in the context of optimal Bayesian transfer learning
journal, March 2022


Optimal Bayesian Transfer Learning
journal, July 2018

  • Karbalayghareh, Alireza; Qian, Xiaoning; Dougherty, Edward R.
  • IEEE Transactions on Signal Processing, Vol. 66, Issue 14
  • DOI: 10.1109/TSP.2018.2839583