Data and Code for Understanding Generative AI Content with Embedding Models
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Rutgers Univ., Piscataway, NJ (United States)
This repository contains code for the experiments in the paper "Understanding Generative AI Content with Embedding Models". Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI).
- Research Organization:
- PNNL (PNNL2)
- Sponsoring Organization:
- PNNL
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 2587970
- Country of Publication:
- United States
- Language:
- English
Similar Records
Understanding Generative AI Content with Embedding Models
Performance Comparison of Machine Learning Models for Ultrasonic Nondestructive Evaluation of Alkali-Silica Reaction in Concrete
Dataset
·
Sun Nov 17 23:00:00 EST 2024
·
OSTI ID:2481996
Performance Comparison of Machine Learning Models for Ultrasonic Nondestructive Evaluation of Alkali-Silica Reaction in Concrete
Technical Report
·
Thu Aug 01 00:00:00 EDT 2024
·
OSTI ID:2438844