Entry for the 2010 "Humies" Awards for Human-Competitive Results -------------------------------------------------------------------------- (1) PAPER TITLE -------------------------------------------------------------------------- Distilling Free-Form Natural Laws from Experimental Data and Symbolic Regression of Implicit Equations ------------------------------------------------------------------------- (2) AUTHORS ------------------------------------------------------------------------- Michael Schmidt 100 Fairview Sq. 6D Ithaca, NY 14850 (608) 385-6363 mds47@cornell.edu Hod Lipson 216 Upson Hall, Cornell University Ithaca, NY 14853-7501, USA (607) 255-1686 hod.lipson@cornell.edu ------------------------------------------------------------------------- (3) CORRESPONDING AUTHOR ------------------------------------------------------------------------- Michael Schmidt ------------------------------------------------------------------------- (4) PAPER ABSTRACT ------------------------------------------------------------------------- For centuries, scientists have attempted to identify and document analytical laws that underlie physical phenomena in nature. Despite the prevalence of computing power, the process of finding natural laws and their corresponding equations has resisted automation. A key challenge to finding analytic relations automatically is defining algorithmically what makes a correlation in observed data important and insightful. We propose a principle for the identification of nontriviality. We demonstrated this approach by automatically searching motion-tracking data captured from various physical systems, ranging from simple harmonic oscillators to chaotic double-pendula. Without any prior knowledge about physics, kinematics, or geometry, the algorithm discovered Hamiltonians, Lagrangians, and other laws of geometric and momentum conservation. The discovery rate accelerated as laws found for simpler systems were used to bootstrap explanations for more complex systems, gradually uncovering the "alphabet" used to describe those systems. ------------------------------------------------------------------------- (5) HUMAN COMPETITIVE CRITERIA ------------------------------------------------------------------------- (G) The result solves a problem of indisputable difficulty in its field. ------------------------------------------------------------------------- (6) WHY CRITERIA IS SATISFIED ------------------------------------------------------------------------- This research demonstrates, for the first time, the automatic discovery of the fundamental laws of nature and physics, directly from imperfect, experimentally-captured data - a feat reserved almost exclusively for human scientists throughout history. The key insight that led to this result is a new principle that identifies intrinsic, non-trivial invariant relationships in noisy data that allows using genetic programming to search for the invariant equations. Identifying invariant relationships is known to be a major challenge even for human scientists; a large number of published invariant quantities have turned out to be coincidental (Science 309, p. 1236, "The Illusion of Invariant Quantities…"). It turns out that identifying a meaningful invariance is difficult computationally as well because so many trivial (and coincidental) invariant equations exist. Genetic programming has allowed us to sift through large datasets efficiently to find only the exact and symbolically meaningful invariant relations. The end result is a system that can tease out scientific laws, in analytical form, merely by observing experimental systems and natural phenomena. Without any prior knowledge about physics, geometry or kinematics, the system discovered Hamiltonians, Lagrangians, and energy and momentum conservation laws by observing systems such as a double pendulum and a spring-mass oscillator. For these reasons, we feel this research solves a problem of indisputable difficulty (G) in many scientific fields - identifying invariant relationships that underlie intrinsic laws of physical phenomena. ------------------------------------------------------------------------- (7) FULL PAPER CITATION ------------------------------------------------------------------------- Schmidt M., Lipson H. (2010), "Symbolic Regression of Implicit Equations," Genetic Programming Theory and Practice, Vol. 7, Chapter 5, pp. 73-85. Schmidt M., Lipson H. (2009) "Distilling Free-Form Natural Laws from Experimental Data," Science, Vol. 324, no. 5923, pp. 81-85. ------------------------------------------------------------------------- (8) STATEMENT ON PRIZE MONEY ------------------------------------------------------------------------- Prize money, if any, is to be divided equally among the co-authors ------------------------------------------------------------------------- (9) STATEMENT ON WHY THIS ENTRY IS THE BEST ------------------------------------------------------------------------- A unique aspect of this entry is that the ability to search for invariants automatically has broad appeal across a range of fields from physics to engineering to biology. Mathematical symmetries and invariants are known to underlie nearly all laws of nature. But increasingly, finding these invariants by hand is becoming exceedingly arduous for human scientists. The ability to search for such relations automatically is human competitive across many domains and could help bring deeper understanding to increasingly complex phenomena where the rules governing their behavior are currently unknown or incomplete.