1. Title of the paper
From Nodes to Networks: Evolving Recurrent Neural Networks

2. Authors
Aditya Rawal and Risto Miikkulainen

Sentient Technologies, Inc., One California St., Suite 2300, San
Francisco, CA 94111, +1 415-422-9886, firstname.fastname@sentient.ai.

Also:
Department of Computer Science, The University of Texas at Austin,
Austin TX 78712, +1 512 471 9571, aditya@cs.utexas.edu,
risto@cs.utexas.edu.


3. Corresponding author
Risto Miikkulainen; risto@cs.utexas.edu


4. Abstract

Gated recurrent networks such as those composed of Long Short-Term
Memory (LSTM) nodes have recently been used to improve state of the
art in many sequential processing tasks such as speech recognition and
machine translation. However, the basic structure of the LSTM node is
essentially the same as when it was first conceived 25 years ago.
Recently, evolutionary and reinforcement learning mechanisms have been
employed to create new variations of this structure. This paper
proposes a new method, evolution of a tree- based encoding of the
gated memory nodes, and shows that it makes it possible to explore new
variations more effectively than other methods. The method discovers
nodes with multiple recurrent paths and multiple memory cells, which
lead to significant improvement in the standard language modeling
benchmark task. The paper also shows how the search process can be
speeded up by training an LSTM network to estimate performance of
candidate structures, and by encouraging exploration of novel
solutions. Thus, evolutionary design of complex neural network
structures promises to improve performance of deep learning
architectures beyond human ability to do so.


5. Criteria satisfied
B, D, F, G


6. Justification of criteria satisfied

The entry shows that evolution can be used to design a complex
learning architecture, i.e. a gated recurrent neural network node
(i.e. LSTM, or Long Short-Term Memory node; Hochreiter and Schmidhuber
1997), better than humans.  LSTMs were proposed 1997 for problems that
require memory, utilizing a memory cell and trainable gates that
control how it is used. For 20 years, their structure was essentially
unchanged; the original authors even published a paper in 2017 with
the conclusion that the variants up to that point were not
significantly better than the standard design (Klaus et al. 2017).

This entry shows that evolution can discover LSTM architectures that
are much more complex than the variations created by humans over the
years: In addition to the linear path that retains the memory value,
they contain multiple nonlinear paths, multiple memory cells, and
utilize different activation functions.  This complexity results in
significantly improved performance in standard machine learning
benchmark tasks.

One of the most challenging such tasks is language modeling,
i.e. predicting the next word in a Penn Tree Bank text corpus (Marcus
et al. 1993). In this benchmark, evolution improved perplexity by 7.1
points, i.e. 9% over the human designed LSTMs. Remarkably, the
improvements were not simply due to increased complexity. When the
same evolved node design was applied to a second benchmark, music
modeling (i.e. predicting the next note; Boulanger-Lewandowski et
al. 2012; Ycart and Benetos 2017), it did not perform better than the
standard LSTM.  But when evolution was again used to optimize the
design for this benchmark specifically, it improved it 12%.  In other
words, evolution can identify the requirements of each task, and
optimize the design to take advantage of them in a way no human can.

References:

N. Boulanger-Lewandowski, Y. Bengio, P. Vincent (2012). Modeling
Temporal Dependencies in High-Dimensional Sequences: Application to
Polyphonic Music Generation and Transcription. In Proceedings of the
29th International Conference on Machine Learning (ICML 2012).

S. Hochreiter and J. Schmidhuber (1997). Long short-term memory".
Neural Computation. 9:1735-1780.

G. Klaus, R. Srivastava, J. Koutnik, R. Steunebrink, and
J. Schmidhuber (2017). LSTM: A search space odyssey. IEEE Transactions
on Neural Networks and Learning Systems 28:2222-2232.

M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. 1993. Building a
large annotated corpus of english: The Penn treebank. Computational
Linguistics 19:313-330.

A. Ycart and E. Benetos (2017). A study on LSTM networks for
polyphonic music sequence modelling. In Proceedings of the
International Society for Music Information Retrieval Conference
(ISMIR), Suzhou, China, 23-28.



7. Citation
A. Rawal and R. Miikkulainen (2018). From Nodes to Networks: Evolving
Recurrent Neural Networks. arXiv:1803.04439. 


8.
Prize money, if any, will be divided equally among the co-authors.


9. Why this entry is the best

This entry highlights a new genre of human-competitive results where
the object of design is itself a learning system. As pointed out by
Holger Hoos in his GECCO'16 keynote, many systems have become too
complex for humans to optimize; automated methods such as evolution
are necessary to obtain full benefit from them. With recent advances
in deep learning (DL), machine learning has also reached this limit.
DL systems now have hundreds of layers, repetitive structures
consisting of dozens of components, complex connectivities, and many
component types, all to be configured with hundreds of
hyperparameters.  They have reached the limit of human design and
optimization.

As this entry shows, such systems can be configured successfully by
evolutionary algorithms, beyond human ability to do so. This approach
is computationally extremely demanding. Training each network takes
days on a state-of-the-art GPU, and during the course of evolutionary
optimization, thousands (possibly millions) of such networks need to
be evaluated. Such computational power is only now becoming available,
and it already makes human-competitive results like the one in this
entry possible. Interestingly, as computational power increases
further, there are very few approaches that can take advantage of such
power---but evolution of DL systems can!  The entry thus demonstrates
that "AI designing AI" is the future of AI---and evolutionary
algorithms are essential to it.


10. Type of EC used
Genetic Programming


11. Date of Publication
March 12, 2018.