In the OSTI Collections: Deep Learning

Dr. Watson computer sleuthing scientist.

Article Acknowledgement:

Dr. William N. Watson, Physicist

DOE Office of Scientific and Technical Information


Alternate Text Placeholder

A typical neuron within a nerve or the brain of any organism will receive chemical signals from other neurons connected to it.  At any given moment, the same neuron will produce a signal of its own whose strength depends on the combination of signals it’s receiving, which will result in a chemical action on other neurons that it connects to “downstream”.  Put differently, the neuron’s output is a mathematically representable function of its input. 

Likewise, any network of connected neurons receives input signals, and puts out signals to other cells that are functions of the network’s inputs.  Generally, each of the network cells that receive any of the initial inputs will transform them into an output signal to one or more other neurons in the network, whose responses to those signals are a function of the signals they get, and so on; so the output of an entire network is a function of the network’s inputs, which can be represented mathematically as a composite function of the individual neurons’ responses. 

Neurons’ response functions aren’t necessarily fixed; they can change with experience.  For instance, the chemical activity where one neuron fastens to another (i.e., at a synapse[Wikipedia]) may change in strength or kind over time as signals are transmitted there more often or more seldom.  Such changes with experience in cells’ responses throughout a network of neurons correspond to the organism’s learning something. 

While computers with standard architectures scarcely resemble a network of neurons, they can be made to simulate such networks, with simulated neurons whose interconnections are adjusted so that each one’s output signals have appropriately stronger or weaker influences, either positive or negative, on the neurons that receive them.  For a network with a given input and set of interconnection strengths, the difference between its actual output and the desired output determines the relative rates at which the interconnections strengths should change to optimally reduce the difference.  Such changes “train” the network in a way that resembles what happens in the nervous system of a learning organism. 

The kinds of functions that real or simulated neural networks can perform depend in part on how many layers of neurons they have between their initial input signals and ultimate output signals—the more layers a network has, the more closely it can approximate any given function.[SciTech Connect; p. 59]  The experience-based adjustment of an artificial network with a depth of many layers to perform a specific function is described as “deep learning” of that function by the network, though the depths and the total neuron numbers of the artificial networks built to date are far smaller than neural networks found in living organisms.[Wikipedia]  Several endeavors funded through the U. S. Department of Energy have explored both the use of deep learning to solve particular computational problems and various improvements to deep learning technology. 


Alternate Text Placeholder

Typical tasks

One type of computation, the calculation of specific numerical values from a set of inputs, is exemplified by an effort to use artificial neural networks to control particle accelerators[Wikipedia], which propel beams of subatomic particles to high energies for various purposes.  In many accelerators, the state of the accelerated subatomic particles is detected at different places along their path and presented to a human controller, who adjusts the accelerator’s electromagnetic fields to keep the particle beam focused along the path intended for it.  Experiments at Fermi National Accelerator Laboratory (Fermilab) that could lead to eventual automatic control of an electron accelerator’s first stage are described in “First Steps Toward Incorporating Image Based Diagnostics Into Particle Accelerator Control Systems Using Convolutional Neural Networks”[SciTech Connect].  Simulated data about the state of the accelerator’s electron source was supplied to a six-layer network, along with simulated images of the electron beam’s cross sections at two different locations beyond the source.  By suitable adjustments of its “neurons’” interconnections, the network was trained to calculate controllable features of the beam—parameters that described the number of particles in the beam, their average energy, and the beam’s physical width and rate of widening.  Such output, when calculated from data about a real electron source and beam, could guide the source to make electron beams with the desired features.  The training involved presenting the network with 1395 simulated sets of electron-source parameters and images of the beam near the source and adjusting the network’s interconnections, and then doing the same with another 894 data sets that included simulated images of a beam further from the source.  Training results—that is, the interconnection adjustments—were checked using another 200 simulated near-source data and image sets and 600 further-along sets.  Beam parameters calculated by the network were in close agreement with those calculated in a more conventional way from the simulation, with differences between 0.4% and 3.1% of the parameter ranges for all parameters.  At the time of their report, the experimenters were proceeding to train the network with actual accelerator data and have it also predict the electron beam’s alignment along with the other parameters. 

A study of another type of subatomic particle involved a different sort of computation.  Instead of having a network calculate parameters from images, a network was used to classify images.  The images in this case were of paths taken by subatomic particles through a particle detector when they were produced by one of the detector’s particles reacting with a neutrino[Wikipedia].  Neutrinos are very unlikely to react with anything—if a neutrino were to enter a slab of lead that was one light-year thick, it would only have about a 50% chance of being absorbed[HyperPhysics] by one of the particles in the slab.  But whenever a neutrino does react with something in its path, the kind of particle it is most likely to react with depends on how long the neutrino has existed.  The state of the neutrino oscillates[Wikipedia], so that the most likely reactions for it change from moment to moment but repeat themselves in a particular cycle described by seven parameters.   

The experiment described in the dissertation “Muon Neutrino Disappearance in NOvA with a Deep Convolutional Neural Network Classifier”[SciTech Connect] was an effort to measure two of the oscillation parameters more precisely.  Neutrinos produced at Fermilab were sent through two detectors, one placed near the source and one placed 810 kilometers away.  The few neutrinos that reacted with particles in the near detector were likely to undergo different reactions from those in the far detector, with the difference depending on how much the neutrinos’ state had oscillated in transit.  Different reactions produce different kinds of particles, which leave different kinds of tracks in a detector.  While computers can be programmed to distinguish certain track features and match them with the most probable state of the neutrino involved in the reaction, distinguishing some reactions (such as those with overlapping particle activity or particles that don’t travel far enough to make recognizable tracks[SciTech Connect, p. 69]) is problematic enough that programs written to calculate the corresponding neutrino state don’t take those reactions into account, thus leaving significant information out of the calculation.  To avoid such omission, a multilayer network was “trained” to distinguish reactions of several types using millions of example sets of reaction tracks, including some reactions that would present problems for a conventional feature-extraction program.  The subtle distinctions “learned” by the network along with the more obvious ones accounted for in conventional programs did allow a more precise determination of neutrinos’ oscillation parameters. 

Figure 1

Figure 1.  When neutrinos[Wikipedia] or other particles react in various ways with the material in a particle detector, other particles are produced that leave characteristic tracks in the detector.  Different initiating particles result in different types of tracks; if the initiating particles are neutrinos, their state can be determined from the types of tracks produced.  The doctoral dissertation “Muon Neutrino Disappearance in NOvA with a Deep Convolutional Neural Network Classifier”[SciTech Connect] describes how a deep neural network was trained to distinguish various types of particle tracks, with the first layers of the network “learning” to spot different significant features in track images, while further layers learned to extract other significant information from combinations of features found by previous layers.  Each of these examples shows an image of tracks from a single reaction (shown at left) along with two elementary features of each image (on the right) that were flagged by a deep network.  The top feature in each example seems to be associated with “showers” of hadrons—particles that respond to the force that holds protons and neutrons together in atomic nuclei—while the bottom feature appears to be associated with more isolated tracks.  In the examples’ annotations:  “nm” = “muon neutrino”, a neutrino state; “CC” = “charged current”, a type of reaction in which electric charge is transferred; “NC” = “neutral current”, a type of reaction in which electric charge is not transferred; “DIS” = “deep inelastic scattering”, a process in which kinetic energy[Wikipedia] is lost when a particle collides with an atom’s nucleus; “QE” = “quasi-elastic”, a process similar to one in which a particle is simply scattered from a nucleus without a loss of kinetic energy, except that one of the nuclear constituents changes from a proton to a neutron or vice versa.  (From “Muon Neutrino Disappearance in NOvA with a Deep Convolutional Neural Network Classifier”[SciTech Connect], p. 89.) 


Alternate Text Placeholder

Network training

Three recent reports address the basic problem of how to adjust a network’s interconnections to output the information that people want to extract about its input.  One way involves having human experts label features of interest in images or other sets of training data, so that the network gets tuned to associate features with appropriate labels, even in data sets presented to it later that contain the features but no labels.  If the human expertise is easy to come by, labeling large sets of training data is not a big problem, but in fields that have few human experts some ingenuity is required to maximize the effectiveness of the relatively small amount of labeling that experts can apply in a reasonable time to large data sets.  Expert labeling was successfully minimized when training one deep network to detect nodules in images of lungs, as described in the report “Lung Nodule Detection using 3D Convolutional Neural Networks Trained on Weakly Labeled Data”[SciTech Connect].  This was done by having the human experts avoid detailed descriptions of the nodules’ shape or texture in the training images, and instead just specify single pixels to locate the nodules along with the largest size expected for each one.  Training the network to associate that minimal data with image features enabled it to “learn” what indicates the presence of a lung nodule in any image. 


Figure 2
Figure 2. A deep network’s estimate of the presence of a lung nodule in a CT scan from a label given by a human expert. (From “Lung Nodule Detection using 3D Convolutional Neural Networks Trained on Weakly Labeled Data”[SciTech Connect], p. 4 of 8.) 


Data labeling by human experts ceases to be a training bottleneck if the need for it is eliminated altogether.  The Scientific Reports article “Deep Learning in Label-free Cell Classification”[DoE PAGES] describes a step in that direction.  The term “Label-free” in the title actually refers to the absence of a different sort of labeling—not a labeling of data, but a physical labeling of the actual entities to be distinguished.  Cells are often labeled by staining[Wikipedia] to make certain features more visible in microscopes, but for some cells no known stains will do the job; so when those cells need to be rapidly detected or sorted, some other way to recognize them is needed.  In this project, the cells to be distinguished were run through a flow cytometer[Wikipedia] and imaged to pick up their size, shape, and amounts by which they absorbed, scattered, and distorted the shapes of light pulses sent through them.  There was no need for human experts to look at the images to find what features of interest each one contained, since the different kinds of cells being imaged to train the network were already sorted before being run through the flow cytometer one kind at a time.  The deep network was trained to distinguish T cells[Wikipedia] from colon-cancer cells, and to distinguish cells of the microalga Chlamydomonas reinhardtii[Wikipedia] whose different strains have different lipid content and thus different suitability as biofuel feedstock.  As was the case with the network trained to pick up on features of neutrino reactions, this network learned to distinguish different things by combinations of features; no single feature of flow cytometer images sufficed for the trained network to achieve its high accuracy in distinguishing different cells.  Yet while its success required combining data about both the cells’ forms and their optical effects, the features that contributed most to that success were different for the two pairs of cell types.  Cell shapes were the most useful single feature for distinguishing T-cells from cancer cells, while the different amounts by which low-lipid from high-lipid algae attenuated the light pulses sent through them contributed more than any other feature to distinguishing those cells from each other. 

A further step away from human-attention dependence involves training a network without labeling the training data in any way whatsoever—one of the main topics of a project at Lawrence Livermore National Laboratory to construct visual-feature representations with which deep networks can process any type of visual data.  The tools developed can train neural networks with billions of parameters on large data sets.  “The Livermore Brain:  Massive Deep Learning Networks Enabled by High Performance Computing”[SciTech Connect] describes three different ways to have a network “learn” such features without human data-labeling. 

One way is to construct a network whose first layer has numerous inputs, with decreasing numbers of artificial neurons and outputs in the following layers, so that the outputs of the smallest layer, in the middle of the network, represent an extract of the input data.[Wikipedia]  The aim is to make this extract a summary of significant features that leaves out insignificant detail.  This is done by having the neuron layers in the latter part of the network include more and more neurons, with the final outputs being as numerous as the initial inputs—and by training the network to always try to reproduce its original inputs from the extract in the middle layer.  This training keeps the network from producing some arbitrary random extract of its inputs.  The fact that the middle layer of the network is smaller than the input and output layers forces the network to summarize its inputs instead of just copying them in each succeeding layer, which would be the simplest way to have the output match the input.  The first half of a network so trained is thus an efficient summarizer of the kind of data it was trained on—with no data labeling.  The Livermore project improved on this method by making different portions of the network operate on different subsets of the data, and have different portions compare the actual and intended output for different data subsets to determine how to adjust the network interconnections.   

A second way to automate network recognition of data is to train it to predict how different kinds of data go together in actual inputs.  For instance, different parts of a facial image, like ears and eyes, will have certain positions relative to each other; outputs that suggest a relative positioning doesn’t match what the images indicate will result in network readjustments.  Networks had been trained to relate pairs of image segments in this way; in this Livermore project, a network learned spatial relationships for sets of three image segments.  A third way involves training two separate networks, one to generate false output that looks like real data, the other to spot features which indicate that output’s falsity.[Wikipedia]  The Livermore group developed new algorithms that stage this dual training to accomplish it quickly, produces stable results, and keeps the networks evenly matched throughout with neither dominating the other. 

Figure 3
Figure 3.  “Unsupervised learning of image features via context prediction involves training convolutional neural networks (CNNs)[Wikipedia] to learn how to classify the position of a randomly selected image patch relative to another image patch.  Upon successful training completion, the CNNs will have learned a shared feature representation of images that captures the structure of objects critical for high performance context prediction and is transferrable to other classification tasks.”  (From “The Livermore Brain:  Massive Deep Learning Networks Enabled by High Performance Computing”[SciTech Connect], p. 14 of 17; link added.) 

The project had two other accomplishments: 

  • Linking together two networks that were trained separately, one to associate words and one to associate words with images, and training the combination to find images that could be correctly described by words that were never used to label such images in any of the training. 
  • Helping to produce the largest open dataset of images and videos for multimedia research. 


Results of the project advance both insight and the accomplishment of practical ends, according to the report.  “These capabilities,” it concludes, “lay the foundation for further research on learning universal multimodal feature representations at massive scales and support the development of a new generation of situational awareness, nuclear nonproliferation, and counter-WMD [weapons of mass destruction] programs.” 


Alternate Text Placeholder

Understanding the output

If a person learns something beyond an unconscious absorption of it to the point of articulate insight, he can summarize what he’s learned to pass his understanding along to other people.  The kind of learning performed by a deep artificial network is less readily passed along, but it can still yield human insight if the network is carefully studied.  Two such studies involve networks trained on quite different data:  a set of simulated particle-detector[Wikipedia] images, and actual human speech. 

Theories of how the elementary particles of matter interact can be tested by tracking the products of particle collisions, because those theories imply that the particles produced will have particular distributions of particle types and motions.  If the actual distributions match the theoretical ones, the theory may be accurate, but if the actual and theoretical distributions differ enough, the theory is more likely than not to be wrong in some way.  “Jet-images—deep learning edition”[DoE PAGES], a paper published in the Journal of High Energy Physics, deals with pairs of protons that collide at high energies, as they do in experiments at the Large Hadron Collider[Wikipedia], and afterwards undergo either of two different kinds of processes that produce “jets”—collimated sprays—of subatomic particles whose paths are registered in the pixels of a particle detector.  Each kind of process results in jets whose paths should, according to standard theory[Wikipedia], be characterizable by parameters that follow different distributions.  Some distinguishing parameters were already known; others were found by a deep network trained by the paper’s authors to sort simulated reaction products into two groups according to which kind of process generated them.  Once trained, the network would then be able to infer which one of the two process types was more likely to have produced the results of any real reaction.  By comparing the network’s inferences with other data—such as inferences based only on the discriminatory parameters that were previously known, the frequency with which those parameters turned up in the two groups of reactions sorted out by the network, and how the network’s inferences correlated with the distributions of individual pixel content across the network’s input data—the authors got some insight into what parameters the network had implicitly worked out to distinguish the two kinds.  They found that the network had effectively learned two of the parameters already known to distinguish the reaction types, and had learned one other significant parameter incompletely, but had worked out enough other previously-unknown parameters (relating, for example, to radiation surrounding the jets) to outperform discriminators based only on the earlier-known ones.  Understanding the significance of the new parameters, and figuring out how to get a deep network to completely learn the parameter that it had learned incompletely, were mentioned as goals of further work. 

Figure 4
Figure 4.  Average “images” of how jets (collimated sprays) of subatomic particles are distributed when the jets are produced by each of two very different processes.  The jet distributions look much less different, but subtle distinctions between them can be picked up by a computer programmed to calculate certain known significant parameters, or by a deep network that has learned some distinguishing features.  The features learned by a network described in “Jet-images—deep learning edition”[DoE PAGES] only partially coincided with the distinguishing parameters previously known to physicists; the network hadn’t completely learned one of the parameters, but other features that the network did find enabled it to more than compensate for that.  (From “Jet-images—deep learning edition”[DoE PAGES], p. 5.) 

The other report, “Analysis of Pre-Trained Deep Neural Networks for Large-Vocabulary Automatic Speech Recognition”[SciTech Connect], notes that in the last few years, while automatic speech recognizers that incorporate deep networks have been shown superior to those that don’t, even deep networks’ speech recognition could be improved in several ways.  The report describes many of these improvements in terms of data preprocessing and training techniques, but some improvements relate specifically to insights about language that have been gained through deep-learning experiments.  Possibly the most significant of these insights come from experiments with diversified training data.  Any diversity of training data is found to correlate positively with improved performance, although the diversity means that it takes longer to train the network.  One useful diversification is as simple as adding noise to the input, but a more interesting improvement comes from training the network on not just one language, but multiple languages in the same language family[Wikipedia].  Such experiments provide evidence that all language has an inherent pattern of significant consistencies “below the surface layer”.  Suitable training data could thus help the network learn the pattern and err less, even without any improvements in computer hardware.  According to the report’s conclusion, “As a deeper understanding of speech is developed, this low-level knowledge in particular could prove revolutionary to the field of ASR [automatic speech recognition].” 


Alternate Text Placeholder

Other advances

If a large deep network is to be used to learn a very complex function, that complexity may warrant implementing the network and its training on a supercomputer.  Supercomputers are also set up to perform multiple computations in parallel, which facilitates deep learning, since presenting input data to a network and adjusting its neurons’ interconnection strengths both involve computational sequences that can be done in parallel.  However, without careful programming, the time different parts of a supercomputer spend working on their sequences of steps can be exceeded by the time they spend communicating their intermediate results to each other for further processing.  One method for reducing the amount of data exchange within the supercomputer is explored in the Lawrence Livermore report “Communication Quantization for Data-parallel Training of Deep Neural Networks”[SciTech Connect].  The “quantization” referred to in the title is a way to minimize information about how far off the network’s current outputs are from their intended values as the different parts of the computer exchange this information while the network is being trained.  Instead of communicating these deviations precisely with large numbers of bits, most of the deviations are reported by single bits as being either above or below certain thresholds, thus “quantizing” the continuum of possible deviations to the two or three values “high”, “low”, and possibly “medium”, with less frequent reporting of the average size of batches of high and low deviations.  One of the quantization methods worked faster than several others for network layers larger than a few hundred neurons, and surpassed an otherwise superior nonquantized communication algorithm for layers having several thousand neurons or more.  Quantizing and averaging the output deviations during training took more time than was saved unless the network layers were very large, but paid off when there were enough neurons for the supercomputer to use a lot of time for internal communication. 

In one obvious sense, artificial neural networks are simulations of real ones.  But many of the deep networks described in the aforementioned reports, including the ones discussed in “Communication Quantization for Data-parallel Training of Deep Neural Networks”, take simulation a step further.  Instead of being sets of hardware that behave similarly to real networks of living neurons, they’re actually software simulations of such hardware, which (like many such simulations) can be easier to build or work with than the actual hardware would be.  They thus exemplify the fact that, with appropriate software, computers can even simulate other computers of a different design—and if that design is an artificial neural network, the result is a simulation of a simulation:  a software imitation of hardware that resembles a network of real neurons.  One reason to just simulate the behavior of an artificial neural network is that most computers’ architecture, which follows the basic design proposed by John von Neumann in 1945[Wikipedia], has little resemblance to a network of neurons.  Still, actual hardware networks have been built that directly simulate neural networks, like the ones described in the report “TrueNorth Ecosystem for Brain-Inspired Computing:  Scalable Systems, Software, and Applications”[SciTech Connect].  These networks are based on million-neuron silicon chips that “have demonstrated orders of magnitude improvement in computational energy-efficiency and throughput” and can classify things accurately when trained, even if their artificial neurons and interconnections have low precision. 

Figure 5
Figure 5.  The logical representation (left) and physical implementation (right) of the TrueNorth processor for deep networks.  
“The scalability of the TrueNorth architecture derives from its modular tiled-core structure.  Each TrueNorth neurosynaptic core represents a tiny fully-connected neural network with 256 output neurons and 256 input axons[Wikipedia], connected by 256 × 256 synapses[Wikipedia], one for every axon-neuron combination—a complete bipartite graph[Wikipedia] [upper left].  When cores are tiled on the chip and connected by a network-on-the-chip, these small bipartite graphs are combined to form a larger neural network [lower left].  
 “The TrueNorth chip consists of 4,096 cores, tiled as a 64 × 64 array [lower right].  Each chip implements over 1 million neurons and over 256 million synapses, using 5.4 billion transistors fabricated in Samsung’s 28nm LPP process technology.  At 0.775V supply, each chiop consumes approximately 70mW while running a typical vision application.  Cores are implemented as a fanout crossbar structure [upper right].  An input spike event activates an axon, which drives all connected neurons.  Neurons integrate incoming spikes, weighted by synaptic strength.  When [a neuron’s total input exceeds a certain value], it fires a spike, transmitting it to a preprogrammed target axon on any core in the network.”  (After “TrueNorth Ecosystem for Brain-Inspired Computing:  Scalable Systems, Software, and Applications”[SciTech Connect], pp. 131-132.) 

Figure 6
Figure 6.  “To support scaling beyond chip boundaries, [TrueNorth] chips can also be tiled in two dimensions via native event-driven SerDes links, enabling scaling to even larger networks, as shown ….  This makes it possible and relatively simple to tile TrueNorth chips in a two-dimensional array ….”  (After “TrueNorth Ecosystem for Brain-Inspired Computing:  Scalable Systems, Software, and Applications”[SciTech Connect], pp. 131, 132.) 

Achieving the hardware’s full potential requires a “comprehensive ecosystem” of support tools to specify, train, and deploy neural networks on it.  According to the report, an application that took roughly one person-decade to develop without such an ecosystem could be produced with the ecosystem in person-weeks or person-days with better results.  The ecosystem described in the report includes (a) two ways to link sets of 16 chips together to form either a set of neural networks that operate in parallel or a single large neural network, (b) a development workflow[Wikipedia] with several software tools for tasks like training networks and preprocessing data, and (c) an algorithm for minimizing communications within and between chips.  Three example applications are described in the report, which respectively recognize handwritten characters, extract and recognize text appearing in photographs, and detect defects in additively manufactured products. 

The kind of deep learning described in the preceding reports is just one way of computing that more closely resembles what networks of real neurons do.  The resemblance can be made closer by mimicking other neural features.  For example, a real neuron’s output is not a steady voltage or current that changes with appropriate changes of input, but is a series of pulses whose timing conveys information.  Real neural systems can also process noisy data[Wikipedia] and learn from much smaller data sets than it takes to train the artificial deep networks described earlier.  A Department of Energy Workshop report entitled “Neuromorphic Computing:  Architectures, Models, and Applications—A Beyond-CMOS Approach to Future Computing”[SciTech Connect] surveys the current range of efforts to make computers more like real neural networks in various useful ways, some of which go beyond those of current of deep learning networks.  (The “CMOS” in the title stands for “Complementary Metal Oxide Semiconductor”[Wikipedia], which describes a widely used type of integrated circuit[Wikipedia].)  The report gives several reasons”[SciTech Connect, pp. 4-5, 38] for the Energy Department to have a role in developing “a new type of computer that can proactively interpret and learn from data, solve unfamiliar problems using what it has learned, and operate with the energy efficiency of the human brain”: 

  • Fundamental scientific breakthroughs in neuroscience, machine intelligence, and materials science are likely to result from the effort. 
  • Although the payoff would be high, the long lead times for practical product development and marketing may present too great a risk for current commercial investment. 
  • As government uses of neuromorphic computing would mostly differ from commercial ones, strictly commercial neuromorphic products would be inadequate for those uses; also, the Energy Department’s own uses would fundamentally differ from those of other government agencies. 
  • Other government investments would likely provide far less long-term economic return. 
  • “The government’s long history of successful investment in computing technology (probably the most valuable investment in history) is a proven case study that is relevant to the opportunity in neuromorphic computing.” 
  • “The massive, ongoing accumulation of data everywhere is an untapped source of wealth and well-being for the nation.” 
  • The Energy Department’s Office of Science, particularly that office’s Advanced Scientific Computing Research program, addresses significant problems that neuromorphic computing could help solve, and funds world-class facilities with important relevant capabilities as well as researchers in the relevant fields who have experience in building interdisciplinary collaborations. 


The workshop participants arrived at several conclusions: 

  • Ways to simulate and evaluate different neuromorphic systems are needed.  Benchmarks need to be chosen well:  evaluating the systems by how well they perform a too-small, homogeneous set of applications could restrict how developers think about neuromorphic systems and what they can do. 
  • Theories of learning and intelligence need to be developed, and their application to neuromorphic computers understood. 
  • Large-scale coordination of effort, and a neuromorphic “hub” for sharing successes, failures, supporting software, datasets, application simulations, and hardware designs, would be greatly beneficial. 
  • Emulating the brain is not the goal.  “We should instead take inspiration from biology but not limit ourselves to particular models or algorithms, just because they work differently from their corresponding biological systems, nor should we include mechanisms just because they are present in biological systems.” 


To whatever degree brains are imitated, or surpassed in some respects, it seems that deep learning is likely to be involved in some form.  A system will have to be informed, or “learn” somehow, in order to do its job; unless people find significant informatic features of neuron function that don’t involve the neurons’ interacting in a network, any system that’s at all neuromorphic would seem to require a network architecture; and unless the components process information quite differently from the way neurons do, any network that could execute a complex computation would probably have to be several layers deep.    


Alternate Text Placeholder





Additional references

  • Deep learning” by Yann LeCun, Yoshua Bengio & Geoffrey Hinton, Nature, Volume 521, Issue 7553, pp. 436-444 (28 May 2015).  Review article. 
  • Deep Learning (MIT Press), by Ian Goodfellow and Yoshua Bengio and Aaron Courville.  (“textbook … a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular.  The online version of the book is now complete and will remain available online for free.”) 
  • Neural Networks and Deep Learning by Michael Nielsen.  (“… a free online book.  …  Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing.  This book will teach you many of the core concepts behind neural networks and deep learning.”) 

Reports available in OSTI’s SciTech Connect

Reports available through DoE PAGES


Alternate Text Placeholder