Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions

Journal Article · · PLoS Biology (Online)

Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.

Sponsoring Organization:
USDOE
Grant/Contract Number:
FG02-97ER25308
OSTI ID:
2229937
Alternate ID(s):
OSTI ID: 2471915
Journal Information:
PLoS Biology (Online), Journal Name: PLoS Biology (Online) Journal Issue: 12 Vol. 21; ISSN 1545-7885
Publisher:
Public Library of Science (PLoS)Copyright Statement
Country of Publication:
United States
Language:
English

References (111)

Human brain activation during passive listening to sounds from different locations: An fMRI and MEG study journal June 2005
Cortical Surface-Based Analysis journal February 1999
The Perception of Speech Under Adverse Conditions book
Unsupervised Segmentation in Real-World Images via Spelke Object Inference book January 2022
Tuning to Binaural Cues in Human Auditory Cortex journal October 2015
On Logical Inference over Brains, Behaviour, and Artificial Neural Networks journal February 2023
Derivation of auditory filter shapes from notched-noise data journal August 1990
Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex journal June 2003
A global optimisation method for robust affine registration of brain images journal June 2001
Parallel and distributed encoding of speech across human auditory cortex journal September 2021
Deep neural network models of sensory systems: windows onto the role of task constraints journal April 2019
Sensitive periods in human development: Evidence from musical training journal October 2011
Music listening engages specific cortical regions within the temporal lobes: Differences between musicians and non-musicians journal October 2014
A Spatial Map of Onset and Sustained Responses to Speech in the Human Superior Temporal Gyrus journal June 2018
A neural population selective for song in human auditory cortex journal April 2022
Successes and critical failures of neural networks in capturing human-like speech recognition journal May 2023
Locating the initial stages of speech–sound processing in human temporal cortex journal July 2006
Accurate and robust brain image alignment using boundary-based registration journal October 2009
Encoding and decoding in fMRI journal May 2011
Seeing it all: Convolutional network layers map the function of the human visual system journal May 2017
Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images journal August 2019
Cerebral Responses to Change in Spatial Location of Unattended Sounds journal September 2007
Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis journal September 2011
Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition journal December 2015
A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy journal May 2018
Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence journal November 2020
Representational geometry: integrating cognition, computation, and the brain journal August 2013
A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons journal February 1988
Network model of shape-from-shading: neural function arises from both receptive and projective fields journal June 1988
Natural speech reveals the semantic maps that tile human cerebral cortex journal April 2016
A multi-modal parcellation of human cerebral cortex journal July 2016
Double dissociation of 'what' and 'where' processing in auditory cortex journal April 2008
Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing journal May 2009
Categorical speech representation in human superior temporal gyrus journal October 2010
The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts journal May 2015
Efficient coding of natural sounds journal March 2002
The cortical organization of speech processing journal April 2007
The what, where and how of auditory-object perception journal September 2013
Toward a universal decoder of linguistic meaning from brain activation journal March 2018
Adaptation of the human auditory cortex to changing background noise journal June 2019
Invariance to background noise as a signature of non-primary auditory cortex journal September 2019
Language prediction mechanisms in human auditory cortex journal October 2020
Qualitative similarities and differences in visual object representations between brains and deep networks journal March 2021
Limits to visual representational correspondence between convolutional neural networks and the human brain journal April 2021
Computational models of category-selective brain regions enable high-throughput tests of selectivity journal September 2021
Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception journal December 2021
Deep neural network models of sound localization reveal how perception is adapted to real-world environments journal January 2022
Multiscale temporal integration organizes hierarchical computation in human auditory cortex journal February 2022
If deep learning is the answer, what is the question? journal November 2020
Array programming with NumPy journal September 2020
SciPy 1.0: fundamental algorithms for scientific computing in Python journal February 2020
A deep learning framework for neuroscience journal October 2019
Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds journal March 2023
Encoding of speech in convolutional layers and the brain stem based on language experience journal April 2023
Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence journal June 2016
Task-modulated “what” and “where” pathways in human auditory cortex journal September 2006
Mechanisms of noise robust representation of speech in primary auditory cortex journal April 2014
Performance-optimized hierarchical models predict neural responses in higher visual cortex journal May 2014
Unsupervised neural network models of the ventral visual stream journal January 2021
The neural architecture of language: Integrative modeling converges on predictive processing journal November 2021
“What” and “where” in the human auditory system journal September 2001
Mechanisms and streams for processing of “what” and “where” in auditory cortex journal October 2000
Effective Dimensionality: A Tutorial journal March 2020
The revolution will not be controlled: natural stimuli in speech neuroscience journal July 2018
Hierarchical Organization of Auditory and Motor Representations in Speech Perception: Evidence from Searchlight Similarity Analysis journal July 2015
Opponent Coding of Sound Location (Azimuth) in Planum Temporale is Robust to Sound-Level Variations journal November 2015
ImageNet: A large-scale hierarchical image database
  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848
conference June 2009
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images conference June 2015
Librispeech: An ASR corpus based on public domain audio books conference April 2015
Audio Set: An ontology and human-labeled dataset for audio events conference March 2017
Clotho: an Audio Captioning Dataset conference May 2020
WHAMR!: Noisy and Reverberant Single-Channel Speech Separation conference May 2020
From Microphone to Phoneme: An End-to-End Computational Neural Model for Predicting Speech Perception With Cochlear Implants journal November 2022
Automated audio captioning with recurrent neural networks conference October 2017
Evaluating (and Improving) the Correspondence Between Deep Neural Networks and Human Representations journal September 2018
Multiresolution spectrotemporal analysis of complex sounds journal August 2005
Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing journal October 1990
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers journal November 1997
Reassessing hierarchical correspondences between brain and deep networks through direct interface journal July 2022
Neural population control via deep image synthesis journal May 2019
Music-selective neural populations arise without musical training journal June 2021
Hierarchical Organization of the Human Auditory Cortex Revealed by Functional Magnetic Resonance Imaging journal January 2001
Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future journal September 2021
Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting journal August 2021
Learning Midlevel Auditory Codes from Natural Sound Statistics journal March 2018
Constructing Noise-Invariant Representations of Sound in the Auditory Pathway journal November 2013
Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex journal December 2018
Noise-trained deep neural networks effectively predict human vision and its neural responses to challenging images journal December 2021
Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus journal July 2012
Noise-invariant Neurons in the Avian Auditory Cortex: Hearing the Song in Noise journal March 2013
Encoding of Natural Sounds at Multiple Spectral and Temporal Resolutions in the Human Auditory Cortex journal January 2014
Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation journal November 2014
Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis journal April 2017
Spiking network optimized for word recognition in noise predicts auditory system hierarchy journal June 2020
XDream: Finding preferred stimuli for visual neurons using generative networks and gradient-free optimization journal June 2020
Cortical Representation of Natural Complex Sounds: Effects of Acoustic Features and Auditory Object Category journal June 2010
Neural Modulation Tuning Characteristics Scale to Efficiently Encode Natural Sound Statistics journal November 2010
Human-Like Modulation Sensitivity Emerging through Optimization to Natural Sound Recognition journal April 2023
Distinct Mechanisms for Processing Spatial Sequences and Pitch Sequences in the Human Auditory Brain journal July 2003
Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition journal May 2019
The Hierarchical Cortical Organization of Human Speech Processing journal June 2017
Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream journal July 2015
Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech journal March 2013
Transformers: State-of-the-Art Natural Language Processing conference January 2020
Do self-supervised speech models develop human-like perception biases? conference January 2022
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing conference January 2018
The Zero Resource Speech Challenge 2019: TTS Without T conference September 2019
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units conference October 2020
AST: Audio Spectrogram Transformer conference August 2021
The Proof and Measurement of Association between Two Things journal January 1904
Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models journal June 2020

Similar Records

A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy
Journal Article · Thu Apr 19 00:00:00 EDT 2018 · Neuron · OSTI ID:1538638

Invariance to background noise as a signature of non-primary auditory cortex
Journal Article · Mon Sep 02 00:00:00 EDT 2019 · Nature Communications · OSTI ID:1610345

Related Subjects