1. Title of the paper From Nodes to Networks: Evolving Recurrent Neural Networks 2. Authors Aditya Rawal and Risto Miikkulainen Sentient Technologies, Inc., One California St., Suite 2300, San Francisco, CA 94111, +1 415-422-9886, firstname.fastname@sentient.ai. Also: Department of Computer Science, The University of Texas at Austin, Austin TX 78712, +1 512 471 9571, aditya@cs.utexas.edu, risto@cs.utexas.edu. 3. Corresponding author Risto Miikkulainen; risto@cs.utexas.edu 4. Abstract Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement learning mechanisms have been employed to create new variations of this structure. This paper proposes a new method, evolution of a tree- based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods. The method discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task. The paper also shows how the search process can be speeded up by training an LSTM network to estimate performance of candidate structures, and by encouraging exploration of novel solutions. Thus, evolutionary design of complex neural network structures promises to improve performance of deep learning architectures beyond human ability to do so. 5. Criteria satisfied B, D, F, G 6. Justification of criteria satisfied The entry shows that evolution can be used to design a complex learning architecture, i.e. a gated recurrent neural network node (i.e. LSTM, or Long Short-Term Memory node; Hochreiter and Schmidhuber 1997), better than humans. LSTMs were proposed 1997 for problems that require memory, utilizing a memory cell and trainable gates that control how it is used. For 20 years, their structure was essentially unchanged; the original authors even published a paper in 2017 with the conclusion that the variants up to that point were not significantly better than the standard design (Klaus et al. 2017). This entry shows that evolution can discover LSTM architectures that are much more complex than the variations created by humans over the years: In addition to the linear path that retains the memory value, they contain multiple nonlinear paths, multiple memory cells, and utilize different activation functions. This complexity results in significantly improved performance in standard machine learning benchmark tasks. One of the most challenging such tasks is language modeling, i.e. predicting the next word in a Penn Tree Bank text corpus (Marcus et al. 1993). In this benchmark, evolution improved perplexity by 7.1 points, i.e. 9% over the human designed LSTMs. Remarkably, the improvements were not simply due to increased complexity. When the same evolved node design was applied to a second benchmark, music modeling (i.e. predicting the next note; Boulanger-Lewandowski et al. 2012; Ycart and Benetos 2017), it did not perform better than the standard LSTM. But when evolution was again used to optimize the design for this benchmark specifically, it improved it 12%. In other words, evolution can identify the requirements of each task, and optimize the design to take advantage of them in a way no human can. References: N. Boulanger-Lewandowski, Y. Bengio, P. Vincent (2012). Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML 2012). S. Hochreiter and J. Schmidhuber (1997). Long short-term memory". Neural Computation. 9:1735-1780. G. Klaus, R. Srivastava, J. Koutnik, R. Steunebrink, and J. Schmidhuber (2017). LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28:2222-2232. M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of english: The Penn treebank. Computational Linguistics 19:313-330. A. Ycart and E. Benetos (2017). A study on LSTM networks for polyphonic music sequence modelling. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Suzhou, China, 23-28. 7. Citation A. Rawal and R. Miikkulainen (2018). From Nodes to Networks: Evolving Recurrent Neural Networks. arXiv:1803.04439. 8. Prize money, if any, will be divided equally among the co-authors. 9. Why this entry is the best This entry highlights a new genre of human-competitive results where the object of design is itself a learning system. As pointed out by Holger Hoos in his GECCO'16 keynote, many systems have become too complex for humans to optimize; automated methods such as evolution are necessary to obtain full benefit from them. With recent advances in deep learning (DL), machine learning has also reached this limit. DL systems now have hundreds of layers, repetitive structures consisting of dozens of components, complex connectivities, and many component types, all to be configured with hundreds of hyperparameters. They have reached the limit of human design and optimization. As this entry shows, such systems can be configured successfully by evolutionary algorithms, beyond human ability to do so. This approach is computationally extremely demanding. Training each network takes days on a state-of-the-art GPU, and during the course of evolutionary optimization, thousands (possibly millions) of such networks need to be evaluated. Such computational power is only now becoming available, and it already makes human-competitive results like the one in this entry possible. Interestingly, as computational power increases further, there are very few approaches that can take advantage of such power---but evolution of DL systems can! The entry thus demonstrates that "AI designing AI" is the future of AI---and evolutionary algorithms are essential to it. 10. Type of EC used Genetic Programming 11. Date of Publication March 12, 2018.