skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes

Abstract

One of the key, emerging challenges that connects the "Big Data" and the AI domain is the availability of sufficient volumes of training data for AI/Machine Learning tasks. SynthNotes is a framework for generating standards-compliant, realistic mental health progress report notes at the very large, population-level scale, and in a strict privacy-preserving manner. Our framework, inspired by the needs to explore, evaluate, and train computational methods for the emerging mental health crisis in the US, is useful for benchmarking, optimization, and training of biomedical natural language processing, information extraction, and machine learning systems intended to operate at "Big Data" scale (billions of notes). The free text notes generated by SynthNotes are based on the literature and public statistical models allowing for realistic, natural language representation of a patient, and his or her mental health characteristics. Additionally, SynthNotes can partially simulate stylistic, grammatical, and expressive characteristics of a licensed mental health professional. SynthNotes is modular and flexible, allowing for representation of variety of conditions, incorporation of alternative foundational models, and parametrization of the variability of the structure, content, and size of the synthetically generated corpus. In this paper, we report on the initial use and performance characteristics of our SynthNotes frameworkmore » and on the ongoing work for inclusion of content planning and deep learning-based generative methods trained on real data.« less

Authors:
ORCiD logo [1];  [1];  [1];  [2]
  1. ORNL
  2. Stanford University
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1507868
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2018 IEEE International Conference on Big Data - Seattle, Washington, United States of America - 8/10/2018 8:00:00 AM-8/13/2018 8:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Begoli, Edmon, Brown, Kris, Srinivasan, Sudarshan, and Tamang, Suzanne. SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes. United States: N. p., 2018. Web. doi:10.1109/BigData.2018.8621981.
Begoli, Edmon, Brown, Kris, Srinivasan, Sudarshan, & Tamang, Suzanne. SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes. United States. https://doi.org/10.1109/BigData.2018.8621981
Begoli, Edmon, Brown, Kris, Srinivasan, Sudarshan, and Tamang, Suzanne. 2018. "SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes". United States. https://doi.org/10.1109/BigData.2018.8621981. https://www.osti.gov/servlets/purl/1507868.
@article{osti_1507868,
title = {SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes},
author = {Begoli, Edmon and Brown, Kris and Srinivasan, Sudarshan and Tamang, Suzanne},
abstractNote = {One of the key, emerging challenges that connects the "Big Data" and the AI domain is the availability of sufficient volumes of training data for AI/Machine Learning tasks. SynthNotes is a framework for generating standards-compliant, realistic mental health progress report notes at the very large, population-level scale, and in a strict privacy-preserving manner. Our framework, inspired by the needs to explore, evaluate, and train computational methods for the emerging mental health crisis in the US, is useful for benchmarking, optimization, and training of biomedical natural language processing, information extraction, and machine learning systems intended to operate at "Big Data" scale (billions of notes). The free text notes generated by SynthNotes are based on the literature and public statistical models allowing for realistic, natural language representation of a patient, and his or her mental health characteristics. Additionally, SynthNotes can partially simulate stylistic, grammatical, and expressive characteristics of a licensed mental health professional. SynthNotes is modular and flexible, allowing for representation of variety of conditions, incorporation of alternative foundational models, and parametrization of the variability of the structure, content, and size of the synthetically generated corpus. In this paper, we report on the initial use and performance characteristics of our SynthNotes framework and on the ongoing work for inclusion of content planning and deep learning-based generative methods trained on real data.},
doi = {10.1109/BigData.2018.8621981},
url = {https://www.osti.gov/biblio/1507868}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Sat Dec 01 00:00:00 EST 2018},
month = {Sat Dec 01 00:00:00 EST 2018}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

New Data on Suicide Risk Among Military Veterans
journal, October 2017


VistA—U.S. Department of Veterans Affairs national-scale HIS
journal, March 2003


A synthetic Longitudinal Study dataset for England and Wales
journal, December 2016


Synthetic Text Generation for Sentiment Analysis
conference, January 2015


Learning to Write Case Notes Using the SOAP Format
journal, July 2002


A Hybrid Convolutional Variational Autoencoder for Text Generation
conference, January 2017

  • Semeniuta, Stanislau; Severyn, Aliaksei; Barth, Erhardt
  • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
  • https://doi.org/10.18653/v1/D17-1066

Texygen
conference, June 2018


The DSM-5: Classification and criteria changes
journal, June 2013


MIMIC-III, a freely accessible critical care database
journal, May 2016


The Synthetic Data Vault
conference, October 2016


TextGen: a realistic text data content generation method for modern storage system benchmarks
journal, October 2016


synthpop: Bespoke Creation of Synthetic Data in R
journal, January 2016


Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record
journal, August 2017


Community-Wide Health Risk Assessment Using Geographically Resolved Demographic Data: A Synthetic Population Approach
journal, January 2014


Data-driven approach for creating synthetic electronic medical records
journal, October 2010


The Unified Medical Language System (UMLS): integrating biomedical terminology
journal, January 2004


Addressing the Opioid Epidemic in the United States
journal, May 2017


Predictive Modeling and Concentration of the Risk of Suicide: Implications for Preventive Interventions in the US Department of Veterans Affairs
journal, September 2015


Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers
journal, June 2017


Automatically generating Wikipedia articles
conference, January 2009

  • Sauper, Christina; Barzilay, Regina
  • Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09
  • https://doi.org/10.3115/1687878.1687909

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications
journal, September 2010


Automatic generation of textual summaries from neonatal intensive care data
journal, May 2009