1. the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result; GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data ------------------------------------------------------------------------------- 2. the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s); Name: Ekaterina Noskova Physical address: ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation E-mail address: ekaterina.e.noskova@gmail.com Phone number: +7-921-650-81-03 Name: Vladimir Ulyantsev Physical address: ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation E-mail address: vl.ulyantsev@gmail.com Phone number: +7-904-646-64-02 Name: Klaus-Peter Koepfli Physical address: Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Ave., NW Washington, D.C. 20008, USA E-mail address: KoepfliK@si.edu Phone number: +1-310-903-0197 Name: Stephen J O’Brien Physical address: Guy Harvey Oceanographic Center, Nova Southeastern University Ft. Lauderdale, 8000 North Ocean Drive, Ft. Lauderdale, Florida 33004, USA E-mail address: lgdchief@gmail.com Phone number: +1-301-401-6313 Name: Pavel Dobrynin Physical address: ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation E-mail address: pdobrynin@gmail.com Phone number: +7-967-522-97-18 ------------------------------------------------------------------------------- 3. the name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition); Ekaterina Noskova ------------------------------------------------------------------------------- 4. the abstract of the paper(s); Background The demographic history of any population is imprinted in the genomes of the individuals that make up the population. One of the most popular and convenient representations of genetic information is the allele frequency spectrum (AFS), the distribution of allele frequencies in populations. The joint AFS is commonly used to reconstruct the demographic history of multiple populations, and several methods based on diffusion approximation (e.g., ∂a∂i) and ordinary differential equations (e.g., moments) have been developed and applied for demographic inference. These methods provide an opportunity to simulate AFS under a variety of researcher-specified demographic models and to estimate the best model and associated parameters using likelihood-based local optimizations. However, there are no known algorithms to perform global searches of demographic models with a given AFS. Results Here, we introduce a new method that implements a global search using a genetic algorithm for the automatic and unsupervised inference of demographic history from joint AFS data. Our method is implemented in the software GADMA (Genetic Algorithm for Demographic Model Analysis, https://github.com/ctlab/GADMA). Conclusions We demonstrate the performance of GADMA by applying it to sequence data from humans and non-model organisms and show that it is able to automatically infer a demographic model close to or even better than the one that was previously obtained manually. Moreover, GADMA is able to infer multiple demographic models at different local optima close to the global one, providing a larger set of possible scenarios to further explore demographic history. ------------------------------------------------------------------------------- 5. a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies; (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. (D) The result is publishable in its own right as a new scientific result independent of the fact that the result was mechanically created. (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. (G) The result solves a problem of indisputable difficulty in its field. ------------------------------------------------------------------------------- 6. a statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission); Methods for demographic inference provide the interface to construct demographic models with parameters such as sizes of populations, time of their divergence and rates of migrations and some optimizations to find values that “fits” observed genetic data. This inference is limited by models that researcher considers best for observed populations. In our paper we introduced a new method based on the genetic algorithm for automatic inference of demographic history without any supervised knowledge about the data. Genetic algorithm provides better search of parameters values as it is a global search strategy and moreover it could find parameters that were considered fixed for the analysis before, namely dynamics of population size change. We implemented the method in software GADMA. (B) (The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.) Proposed method based on the genetic algorithm was compared with several existing algorithms for demographic inference on different datasets. We test our algorithm on simulated data and on real data from three different papers: i) three populations of modern human [1]; ii) two populations of Gillette’s checkerspot butterfly [2]; iii) and several pairs of Gaboon forest frog populations [3]. For simulated data our method showed its superiority when compared with previously existed algorithms. For real data GADMA produced models with likelihood higher than models that were previously published in original papers. (D) (The result is publishable in its own right as a new scientific result independent of the fact that the result was mechanically created.) GADMA is the first software that can automatically infer demographic histories. The results of our research showed new insights in the evolutionary history of species. We have confirmed and provided more details to the existing demographic history of modern humans [1] and several other species including two published papers [2, 3]. The work and results were published in the GigaScience journal which focuses on new bioinformatics and computational biology software tools and workflows from 'big data'. Also the results were presented at several specialized conferences focused on fields of population genetics and computational biology. This indicates that our work has been peer reviewed and accepted as a new scientific result independent of our use of automatization techniques. (E) (The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions.) Our work was inspired by the original papers that introduced and further developed the demographic inference method based on site frequency spectrum [1,3]. We considered results published in several peer reviewed papers [1, 2, 3] including original ones as a baseline performance for testing purposes. Our method outperforms all existing solutions for demographic inference that are presented in the original papers. Moreover GADMA can perform automatic inference and find more reliable results without any supervised knowledge. We have produced new detailed demographic models with higher likelihood automatically for several datasets. (G) (The result solves a problem of indisputable difficulty in its field.) Researchers seek to find the model of demographic history that best describes or “fits” their data. Existing software provides an opportunity to run multiple optimizations to help fit parameters of a given demographic model that maximizes the value of the likelihood. Despite the fact that researchers need to construct demographic models for inference from their own experience and expectations, all existing optimizations are local search algorithms. They find local optima close to the initial values and require many runs to be performed using different initial model parameters, most of which are unknown or lack empirical data. Our method changes the process of the demographic inference, removing human bias from process. Referencies: [1] Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. and Bustamante, C.D., 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS genetics, 5(10). [2] McCoy, R.C., Garud, N.R., Kelley, J.L., Boggs, C.L. and Petrov, D.A., 2014. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population. Molecular ecology, 23(1), pp.136-150. [3] Portik, D.M., Leaché, A.D., Rivera, D., Barej, M.F., Burger, M., Hirschfeld, M., Rödel, M.O., Blackburn, D.C. and Fujita, M.K., 2017. Evaluating mechanisms of diversification in a Guineo‐Congolian tropical forest frog using demographic model selection. Molecular Ecology, 26(19), pp.5245-5263. ------------------------------------------------------------------------------- 7. a full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable); Ekaterina Noskova, Vladimir Ulyantsev, Klaus-Peter Koepfli, Stephen J O’Brien, Pavel Dobrynin, GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data, GigaScience, Volume 9, Issue 3, March 2020, giaa005, https://doi.org/10.1093/gigascience/giaa005 ------------------------------------------------------------------------------- 8. a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors; Any prize money, if any, is to be awarded to Ekaterina Noskova. ------------------------------------------------------------------------------- 9. a statement stating why the authors expect that their entry would be the "best," and Our method has already been applied in the Genome Russia Project [1,2] and several conservation biology studies of endangered species like Aftrican cheetah, Dama gazelle and Red Siskin. Our work was presented at three conferences (VoGIS, MCCMB, ProbGen) in 2019 and received a lot of attention among researchers that are familiar with demographic inference methods. This interest could be explained by the fact that GADMA is the first software of it’s kind that can automatically infer demographic histories and is not limited by the researcher’s assumptions. It solves a long-standing problem in the field of computational biology and population genetics. The main author of the paper Ekaterina Noskova was invited to make several oral presentations about proposed method to the Computer Science center [3] and Chebyshev Laboratory [4] and also is the author of a popular science article about considered problem and its solution [5]. Referencies: [1] http://genomerussia.spbu.ru/ [2] Zhernakova, D.V., Brukhin, V., Malov, S., Oleksyk, T.K., Koepfli, K.P., Zhuk, A., Dobrynin, P., Kliver, S., Cherkasov, N., Tamazian, G. and Rotkevich, M., 2020. Genome-wide sequence analyses of ethnic populations across Russia. Genomics, 112(1), pp.442-458. [3] https://compscicenter.ru/videos/population-history/ [4] https://sites.google.com/view/industrial-math-seminar/past#h.p_-Gsb3_M9HmQF [5] https://habr.com/ru/company/JetBrains-education/blog/502244/ ------------------------------------------------------------------------------- 10. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc. GA (genetic algorithm) ------------------------------------------------------------------------------- 11. The date of publication of each paper. If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is “in press” by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement 29 February 2020