------------------------------
CRITERIA SATISFIED BY THE WORK

(E) The result is equal to or better than the most recent human-created solution
to a long-standing problem for which there has been a succession of increasingly
better human-created solutions.

(F) The result is equal to or better than a result that was considered
an achievement in its field at the time it was first discovered.

(H) The result holds its own or wins a regulated competition involving human
contestants (in the form of either live human players or human-written computer
programs).


------------------------------------------
STATEMENT OF HUMAN-COMPETITIVENESS (E, F & H)

The attempts to understand the mechanism of protein folding were pioneered by
Christian Anfinsen almost forty years ago. Since the very beginning the main
limitation of this research was the inaccuracy of the molecular dynamics simulations. 
Although along with the growth of computers performance the detail of protein
structure models has increased, the molecular simulation on atomic level is even
nowadays only possible with a use of large distributed systems such as Folding@Home,
which is the most powerful distributed computing system on Earth operating on
performance levels reaching 5 petaFLOPS. As soon as community has realised that
simulation is not practical yet, the scientific effort has shifted towards prediction
where simplified models and expert designed statistical potentials are used.
Recent progress in the field of protein structure prediction was achieved thanks
to the use of machine learning techniques to solve the prediction sub-problems,
e.g. solvent accessibility or contact number prediction, providing better building
blocks for the statistical potentials. The energy function formulation however,
remained unchanged and is still a linear combination of potentials as in folding,
even if the statistical potentials do not represent the physical energy.

In this work we use selected statistical potentials used by I-TASSER, the best
predictor of the last three editions of CASP experiment, and we challenge the
human-made energy function used there. By using genetic programming we allow
a free combination of the energy terms to be evolved and compare its quality
against the I-TASSER approach, that is a weighted sum of terms were weights are
chosen by a human expert using a non-linear numerical optimisation method as a
decision support tool. The quality of evolved energy functions is found to be
better and therefore we believe the work satisfies criteria E, F and H.


----------------------------------
WHY THIS WORK IS WORTH CONSIDERING

The work discussed here deals with an important issue of the energy function design.
It proposes a novel use of an automated method to discover the best combination of
the energy terms, instead of simple weighted sum with hand-picked coefficients
used in the state-of-the-art predictors. The results indicate that the new approach
is more appropriate and leads to higher quality energy functions. As the formulation
of the energy function is a key element of protein structure predictor, as it drives
the process of search for the native-like structures, a better function also means
a higher quality of prediction. And structural models of good quality are
very important in the protein research because since the advance of the DNA sequencing
techniques the gap between the number of known protein sequences and the number
of known structures is growing, currently being at a level of 0.2% of sequences
solved. So we think this work should be consider as best not only because it
presents an interesting human-competitive improvement to the solution of a
long-standing problem but also because the importance and the long-term effects
in protein science that the improvement in prediction quality could bring.

Considering the fact, that after many years of research in the protein structure
prediction field that involved large community of experimenters gather around
the CASP experiments being held regularly since 1994, this is the first time the
automated approach was proposed and the results are competitive with the approach
used in the state-of-the-art I-TASSER predictor, they are extremely encouraging.
Despite all the gradual improvements made in predictors over recent years and
despite the vast amount of research dedicated into optimisation of structures,
the changes in the design of energy function were limited so far to introduction of
new potentials. However, it is the energy function that defines the search
landscape where the best structure is to be found and its smoothness is essential
for the efficient prediction. Without such functions the only resort is a random
walk over a rugged landscape that requires a vast resources as in Folding@Home.

The problem of the design of energy functions for the protein structure
prediction is also a new a truly difficult challenge for the GP. Having that in
mind we have made the input data used in our experiments available online
(with detailed annotations) for everyone who would like to take on this challenge
and we would like to encourage the community engage in solving this interesting
problem: http://www.infobiotics.org/gpchallange/


------------
PUBLICATIONS 

P. Widera, J.M. Garibaldi, N. Krasnogor,
"GP challenge: evolving the energy function for protein structure prediction",
Genetic Programming and Evolvable Machines 11(1), p.61-88, 2010
DOI: 10.1007/s10710-009-9087-0
publisher's link: http://dx.doi.org/10.1007/s10710-009-9087-0

P. Widera,
"Automated design of energy functions for protein structure prediction by means
of genetic programming and improved structure similarity assessment",
PhD Thesis, Univeristy of Nottingham, UK, 2010

P. Widera, J.M. Garibaldi, N. Krasnogor,
"Evolutionary design of the energy function for protein structure prediction",
IEEE Congress on Evolutionary Computation, p.1305-1312, Trondheim, Norway, May 2009
DOI: 10.1109/CEC.2009.4983095
publisher's link: http://dx.doi.org/10.1109/CEC.2009.4983095


---------
ABSTRACTS

1) "GP challenge: evolving the energy function for protein structure prediction"

One of the key elements in protein structure prediction is the ability
to distinguish between good and bad candidate structures. This distinction is
made by estimation of the structure energy. The energy function used in
the best state-of-the-art automatic predictors competing in the most recent CASP
(Critical Assessment of Techniques for Protein Structure Prediction) experiment
is defined as a weighted sum of a set of energy terms designed by experts.
We hypothesised that combining these terms more freely will improve the prediction
quality. To test this hypothesis, we designed a genetic programming algorithm
to evolve the protein energy function. We compared the predictive power of the best
evolved function and a linear combination of energy terms featuring weights
optimised by the Nelder-Mead algorithm. The GP based optimisation outperformed
the optimised linear function.
We have made the data used in our experiments publicly available in order to
encourage others to further investigate this challenging problem by using GP and
other methods, and to attempt to improve on the results presented here.

2) "Automated design of energy functions for protein structure prediction by means
of genetic programming and improved structure similarity assessment"

The process of protein structure prediction is a crucial part of understanding
the function of the building blocks of life. It is based on the approximation
of a protein free energy that is used to guide the search through the
space of protein structures towards the thermodynamic equilibrium of the native
state. A function that gives a good approximation of the protein free energy
should be able to estimate the structural distance of the evaluated candidate
structure to the protein native state. This correlation between the energy and
the similarity to the native is the key to high quality predictions.

State-of-the-art protein structure prediction methods use very simple techniques
to design such energy functions. The individual components of the energy
functions are created by human experts with the use of statistical analysis
of common structural patterns that occurs in the known native structures.
The energy function itself is then defined as a simple weighted sum of these
components. Exact values of the weights are set in the process of maximisation
of the correlation between the energy and the similarity to the native
measured by a root mean square deviation between coordinates of the protein
backbone.

In this dissertation I argue that this process is oversimplified and could
be improved on at least two levels. Firstly, a more complex functional
combination of the energy components might be able to reflect the similarity
more accurately and thus improve the prediction quality. Secondly, a more
robust similarity measure that combines different notions of the protein
structural similarity might provide a much more realistic baseline for
the energy function optimisation.

To test these two hypotheses I have proposed a novel approach to the design of
energy functions for protein structure prediction using a genetic programming
algorithm to evolve the energy functions and a structural similarity consensus to
provide a reference similarity measure. The best evolved energy functions were
found to reflect the similarity to the native better than the optimised weighted
sum of terms, and therefore opening a new interesting area of research for the
machine learning techniques.

3) "Evolutionary design of the energy function for protein structure prediction"

Automatic protein structure predictors use the notion of energy to guide
the search towards good candidate structures. The energy functions used by the
state-of-the-art predictors are defined as a linear combination of several energy
terms designed by human experts. We hypothesised that the energy based guidance
could be more accurate if the terms were combined more freely. To test this
hypothesis, we designed a genetic programming algorithm to evolve the protein
energy function. Using several different fitness functions we examined the
potential of the evolutionary approach on a set of candidate structures generated
during the protein structure prediction process. Although our algorithms were able
to improve over the random walk, the fitness of the best individuals was far
from the optimum. We discuss the shortcomings of our initial algorithm design
and the possible directions for further research.


----------------------------
AUTHORS' CONTACT INFORMATION

Natalio Krasnogor (corresponding author),  
School of Computer Science
University of Nottingham
Jubilee Campus, Wollaton Road
Nottingham, NG8 1BB, UK
e-mail: nxk@cs.nott.ac.uk
phone: +44 115 8467592

Paweł Widera,
School of Computer Science
University of Nottingham
Jubilee Campus, Wollaton Road
Nottingham, NG8 1BB, UK
e-mail: plw@cs.nott.ac.uk
phone: +44 115 9514234

Jonathan Garibaldi,
School of Computer Science
University of Nottingham
Jubilee Campus, Wollaton Road
Nottingham, NG8 1BB, UK
e-mail: jmg@cs.nott.ac.uk
phone: +44 115 9514216

The prize money, if any, is to be divided equally among the co-authors.