Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Data and Code for Understanding Generative AI Content with Embedding Models

Dataset ·
DOI:https://doi.org/10.25584/2587970· OSTI ID:2587970
 [1];  [1];  [1];  [1];  [2]
  1. Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
  2. Rutgers Univ., Piscataway, NJ (United States)
This repository contains code for the experiments in the paper "Understanding Generative AI Content with Embedding Models". Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI).
Research Organization:
PNNL (PNNL2)
Sponsoring Organization:
PNNL
DOE Contract Number:
AC05-76RL01830
OSTI ID:
2587970
Country of Publication:
United States
Language:
English

Similar Records

Understanding Generative AI Content with Embedding Models
Dataset · Sun Nov 17 23:00:00 EST 2024 · OSTI ID:2481996

Performance Comparison of Machine Learning Models for Ultrasonic Nondestructive Evaluation of Alkali-Silica Reaction in Concrete
Technical Report · Thu Aug 01 00:00:00 EDT 2024 · OSTI ID:2438844

Related Subjects