Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

doi:10.1088/1741-2560/13/5/056004

Title: Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

Journal Article · Wed Aug 03 00:00:00 EDT 2016 · Journal of Neural Engineering

DOI:https://doi.org/10.1088/1741-2560/13/5/056004· OSTI ID:1523618

Moses, David A. ^[1]; Mesgarani, Nima ^[2]; Leonard, Matthew K. ^[2]; Chang, Edward F. ^[3]

Univ. of California San Francisco, CA (United States). Dept. of Neurological Surgery; Univ. of California San Francisco, CA (United States). Center for Integrative Neuroscience; Univ. of Califorania Berkeley--Univ. of California San Francisco, CA (United States). Graduate Program in Bioengineering
Univ. of California San Francisco, CA (United States). Dept. of Neurological Surgery; Univ. of California San Francisco, CA (United States). Center for Integrative Neuroscience
Univ. of California San Francisco, CA (United States). Dept. of Neurological Surgery; Univ. of California San Francisco, CA (United States). Center for Integrative Neuroscience; Univ. of California Berkeley--Univ. of California San Francisco, CA (United States). Graduate Program in Bioengineering

The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC02- 05CH11231

OSTI ID:: 1523618

Journal Information:: Journal of Neural Engineering, Vol. 13, Issue 5; ISSN 1741-2560

Country of Publication:: United States

Language:: English

Similar Records

Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex

Journal Article · Mon Sep 16 00:00:00 EDT 2019 · PLoS Computational Biology (Online) · OSTI ID:1523618

Livezey, Jesse A.; Bouchard, Kristofer E.; Chang, Edward F.

Speech recognition systems on the Cell Broadband Engine

Journal Article · Fri Apr 20 00:00:00 EDT 2007 · IBM Journal of Research and Development · OSTI ID:1523618

Liu, Y; Jones, H; Vaidya, S; +3 more

Deep learning approaches for neural decoding across architectures and recording modalities

Journal Article · Tue Dec 29 00:00:00 EST 2020 · Briefings in Bioinformatics · OSTI ID:1523618

Livezey, Jesse A.; Glaser, Joshua I.

Title: Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

Citation Formats

Similar Records

Related Subjects