Tympana - Machine Learning Assisted Data Annotation

Yacci, Paul M.; Yacci, Allison; Soni, Siddhant

Title: Tympana - Machine Learning Assisted Data Annotation

Technical Report · Wed Mar 08 00:00:00 EST 2023

OSTI ID:1960310

Yacci, Paul M. ^[1]; Yacci, Allison ^[1]; Soni, Siddhant ^[1]

DataCicada, LLC

A key component to making data Artificial Intelligence (AI)-ready is to provide machine-readable labels for Machine Learning (ML) algorithms. This mapping of input data to outcome provides the core foundation that supervised Machine Learning algorithms utilize. Without a proper mapping or sufficient volume of quality data, supervised learning becomes difficult if not impossible. In scientific domains this expertise comes from highly trained individuals who can properly interpret this data to provide the correct output mapping. As Deep Learning type algorithms expand, the appetite for larger and larger high-quality datasets becomes insatiable. In scientific domains it can be difficult to identify a scientist who is deeply embedded within the domain and also possesses the required algorithmic skillset to apply their domain knowledge in a scalable fashion to benefit from Machine Learning techniques and tools. Constructing a team with an embedded ML expert is an alternative solution, but requires more overhead in team management and an investment in knowledge transfer between the domain experts and data scientists that few teams can afford to make. DataCicada’s solution brings powerful ML tools to the hands of the domain experts, through a platform named Tympana. While it can be easy to instruct users at diverse education levels how to recognize a specific object within an image, it requires years of training to properly evaluate a sensor reading from a DNA sequencer, an X-ray, CT scan, or a particle accelerator. These domains require an inherent body of knowledge and experience that does not lend itself to annotation by the general public. Tympana integrates active learning, explainable AI, and synthetic data generation to create robust models and datasets that can be exported for scientific workflows. Tympana assists scientists in building high quality Machine Learning models for their complex data, allowing them to scale their expertise, serving as a multiplier in efforts. Tympana currently has proof-of-concept use cases, in Protein Sequences, Image Object Detection and Sensor Signal detection. These use cases serve to demonstrate the capabilities of the Tympana platform. The active learning strategies utilized by Tympana have demonstrated a reduction in data volume requirements in initial experiments that will save scientists time in achieving comparable results. DataCicada expects to beta launch the tool shortly after the completion of Phase I SBIR funding. Because the platform will have been proven with biological data, the first likely customers will be the Federal Government (DOE, NIH, NSF, CDC, FDA), but also pharmaceutical companies for therapeutics, vaccine manufacturers, and Universities performing biological research. SARS-CoV-2 research alone could benefit greatly by improving time and results to researchers.

This content will become available on Fri Aug 01 00:00:00 EDT 2042.

Cite

Export

Save

Research Organization:: DATACICADA, LLC

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Biological and Environmental Research (BER)

Contributing Organization:: Envy Labs

DOE Contract Number:: SC0022459

OSTI ID:: 1960310

Type / Phase:: SBIR (Phase I)

Report Number(s):: DOE-DC-22459

Country of Publication:: United States

Language:: English

Similar Records

Advancing Fusion with Machine Learning Research Needs Workshop Report

Journal Article · Sat Sep 26 00:00:00 EDT 2020 · Journal of Fusion Energy · OSTI ID:1960310

Humphreys, David; Kupresanin, A.; Boyer, M. D.; +10 more

Artificial Intelligence and Machine Learning for Bioenergy Research: Opportunities and Challenges

Technical Report · Tue Aug 23 00:00:00 EDT 2022 · OSTI ID:1960310

Zhao, Huimin; Hillson, Nathan; Kleese van Dam, Kerstin; +1 more

Final Phase I Technical Report - Deep Learning Enabled FAIR Data Management for Center for Functional Nanomaterials

Technical Report · Sun Aug 06 00:00:00 EDT 2023 · OSTI ID:1960310

Sun, Yu

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
60 APPLIED LIFE SCIENCES
97 MATHEMATICS AND COMPUTING
99 GENERAL AND MISCELLANEOUS
96 KNOWLEDGE MANAGEMENT AND PRESERVATION

Title: Tympana - Machine Learning Assisted Data Annotation

Citation Formats

Similar Records

Related Subjects