Models for synthetic data generation

RESOURCE

Abstract

The software includes a suite of probabilistic statistical/machine learning models that can generate discrete synthetic data. Each model is trained on a set of real (private) data and then it can be used to generate synthetic but statistically similar data. Once ready, the model can generate as many samples as we want. Finally, in addition to the actual models, the software includes code to process data, evaluate results (based on cross validation), and produce reports.
Developers:
De Oliveira Sales, Ana Paula [1] Meng, Rui [1] Soper, Braden [1] Priyadip, Ray [1] Goncalves, Andre [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Release Date:
2021-09-30
Project Type:
Open Source, Publicly Available Repository
Software Type:
Scientific
Version:
1.0
Licenses:
BSD 3-clause "New" or "Revised" License
Sponsoring Org.:
Code ID:
70153
Site Accession Number:
IM#1049062
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Country of Origin:
United States

RESOURCE

Citation Formats

De Oliveira Sales, Ana Paula, Meng, Rui, Soper, Braden C., Priyadip, Ray, and Goncalves, Andre R. Models for synthetic data generation. Computer Software. https://github.com/LLNL/SYNDATA. USDOE National Nuclear Security Administration (NNSA). 30 Sep. 2021. Web. doi:10.11578/dc.20220215.1.
De Oliveira Sales, Ana Paula, Meng, Rui, Soper, Braden C., Priyadip, Ray, & Goncalves, Andre R. (2021, September 30). Models for synthetic data generation. [Computer software]. https://github.com/LLNL/SYNDATA. https://doi.org/10.11578/dc.20220215.1.
De Oliveira Sales, Ana Paula, Meng, Rui, Soper, Braden C., Priyadip, Ray, and Goncalves, Andre R. "Models for synthetic data generation." Computer software. September 30, 2021. https://github.com/LLNL/SYNDATA. https://doi.org/10.11578/dc.20220215.1.
@misc{ doecode_70153,
title = {Models for synthetic data generation},
author = {De Oliveira Sales, Ana Paula and Meng, Rui and Soper, Braden C. and Priyadip, Ray and Goncalves, Andre R.},
abstractNote = {The software includes a suite of probabilistic statistical/machine learning models that can generate discrete synthetic data. Each model is trained on a set of real (private) data and then it can be used to generate synthetic but statistically similar data. Once ready, the model can generate as many samples as we want. Finally, in addition to the actual models, the software includes code to process data, evaluate results (based on cross validation), and produce reports.},
doi = {10.11578/dc.20220215.1},
url = {https://doi.org/10.11578/dc.20220215.1},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20220215.1}},
year = {2021},
month = {sep}
}