1. the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result;
The series of the papers, including:
[1] Maslyaev, M., Hvatov, A., & Kalyuzhnaya, A. V. (2021). Partial differential equations discovery with EPDE framework: application for real and synthetic data. Journal of Computational Science, 101345.
[2] Maslyaev, M., & Hvatov, A. (2021, June). Multi-objective discovery of PDE systems using evolutionary approach. In 2021 IEEE Congress on Evolutionary Computation (CEC) (pp. 596-603). IEEE.
[3] Merezhnikov, M., & Hvatov, A. (2021,November). Multi-objective closed-form algebraic expressions discovery approach application to the synthetic time-series generation. Procedia Computer Science, 193, 285-294.
[4] Maslyaev, M., & Hvatov, A. (2022, July). Solver-Based Fitness Function for the Data-Driven Evolutionary Discovery of Partial Differential Equation. In 2022 IEEE Congress on Evolutionary Computation (CEC) (in press, unconditionally accepted).
2. the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s);
Alexander Hvatov
email: alex_hvatov@itmo.ru
phone: +7 952 220 32 76
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
Mikhail Maslyaev
email: mikemaslyaev@itmo.ru
phone: +7 915 145 97 25
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
Anna Kalyuzhnaya
email: anna.kalyuzhnaya@itmo.ru
phone: +7 911 038 27 68
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
Mark Merezhnikov
email: mark.merezhnikov@mail.ru
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
3. the name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition);
AH
4. the abstract of the paper(s);
[1] Data-driven methods provide model creation tools for systems where the application of conventional analytical methods is restrained. The proposed method involves the data-driven derivation of a partial differential equation (PDE) for process dynamics, helping process simulation and study. The paper describes the methods that are used within the EPDE (Evolutionary Partial Differential Equations) partial differential equation discovery framework. The framework involves a combination of evolutionary algorithms and sparse regression. Such an approach is versatile compared to other commonly used data-driven partial differential derivation methods by making fewer assumptions about the resulting equation. This paper highlights the algorithm features that allow data processing with noise, which is similar to the algorithm's real-world applications. This paper is an extended version of the ICCS-2020 conference paper.
[2] Usually, the data-driven methods of the systems of partial differential equations (PDEs) discovery are limited to the scenarios, when the result can be manifested as the single vector equation form. However, this approach restricts the application to the real cases, where, for example, the form of the external forcing is of interest for the researcher and can not be described by the component of the vector equation. In the paper, a multi-objective co-evolution algorithm is proposed. The single equations within the system and the system itself are evolved simultaneously to obtain the system. This approach allows discovering the systems with the form-independent equations. In contrast to the single vector equation, a component-wise system is more suitable for expert interpretation and, therefore, for applications. The example of the two-dimensional Navier-Stokes equation is considered.
[3] Time-series modeling is a well-studied topic of classical analysis and machine learning. However, large datasets are required to obtain the model with a better prediction quality with the increasing model complexity. Therefore, some applications demand synthetic datasets that are preserving modeling-sensitive properties. Another application of synthetic data is data anonymization. The synthetic data generation algorithm may be split into two parts: the time-series modeling and the synthetic data generation parts. The model must be interpretable to obtain the synthetic data with good quality. The model parameter interpretation allows controlling generation by adding noise to different groups of parameters. In the paper, the evolutionary multi-objective closed-form algebraic expressions discovery approach that allows obtaining the model in the form that may be analyzed using the mathematics is proposed. The analysis allows the interpretation of the model parameters for the controllable generation of the synthetic data. The notion of synthetic data quality is discussed. The examples of the synthetic time-series generation based on two datasets with different properties are shown.
[4] Partial differential equations provide accurate models for many physical processes, although their derivation can be challenging, requiring a fundamental understanding of the modeled system. This challenge can be circumvented with the data-driven algorithms that obtain the governing equation only using observational data. One of the tools commonly used in search of the differential equation is the evolutionary optimization algorithm. In this paper, we seek to improve the existing evolutionary approach to data-driven partial differential equation discovery by introducing a more reliable method of evaluating the quality of proposed structures, based on the inclusion of the automated algorithm of partial differential equations solving. In terms of evolutionary algorithms, we want to check whether the more computationally challenging fitness function represented by the equation solver gives the sufficient resulting solution quality increase with respect to the more simple one. The approach includes a computationally expensive equation solver compared with the baseline method, which utilized equation discrepancy to define the fitness function for a candidate structure in terms of algorithm convergence and required computational resources on the synthetic data obtained from the solution of the Korteweg-de Vries equation.
5. a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies;
(B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.
(F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered.
(G) The result solves a problem of indisputable difficulty in its field.
6. a statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission);
(B) Partial differential equations (PDE) discovery is now on fire. We hear the stories that we could extract the equation of motion from the video recording. And it is true. But one still could doubt and try to awake from a dream.
We know what could be wrong: to extract the equation of motion from the recording, we should know such a law and save it within the algorithm. A computer is a very dumb thing; it can only reproduce what it already knows. Does it count as the extraction of law from the video? Technically, yes. We agree that there was a massive amount of preliminary work and fully respect our colleagues.
However, we would like to think a bit further. What if we teach the computer only how to differentiate without any additional knowledge of physical laws? We should add some creativity, which is undoubtedly evolutionary algorithms. There is one more step. We make an algorithm that makes differential equations out of the data and its derivatives.
Creativity without boundaries leads to a very sophisticated equation, and we should be able to control the growth by adding a regularization operator. It serves as a restriction, removes unnecessary terms, and makes the equation more general. Any equation should be as general as it is possible to extract from the data since for particular models, we have neural networks and other machine learning models. The ability to make induction is the crucial point in mathematics.
We find out that the multi-objective approach more resembles the expert’s actions. Namely, we may consider the process on different scales. At the global level, the ball is moved by gravitation. In detail, the air pushes the ball backwards, or when we dig deep enough, electrons are a bit shifted, and the magnetic fields also affect the ball’s motion. That illustrates that we could consider the “precision” of the process reproduction and the complexity of the equation – we already have two criteria. The expert usually uses more than two, and we also have no trouble doing that.
The multi-objectiveness also gives a possibility to build the systems of PDEs. We got the complexity of each equation as a criterion and may choose the systems from the non-dominated Pareto solution hypersurface.
That is the short story. We also got an automated equation solver that helps us control the equation discovery process. That is our way to make the equation discovery not a handcraft but a pure computer masterpiece.
We mention that the same is done for the algebraic expression. We make a co-evolutionary algorithm that combines them all to obtain more and more exciting models soon.
(F) Physicists around the world solve inverse problems. Just imagine how many tools to solve such a well-known problem exist. We try to make “one to rule them all”, and we think it will be possible in the near future. We have already helped some scientists in thermodynamics [TD], electrodynamics (not yet published) and acoustics [AC] (yes, the only acoustician we help is AH, but it is a beginning). We think that we will be able to extract the equations from experimental data for various scientific areas.
[TD] Bykov, N., Hvatov, A., Kalyuzhnaya, A., & Boukhanovsky, A. (2021, July). A method of generative model design based on irregular data in application to heat transfer problems. In Journal of Physics: Conference Series (Vol. 1959, No. 1, p. 012012). IOP Publishing.
[AC] Hvatov, A. (2022, May). Data-Driven Approach for the Floquet Propagator Inverse Problem Solution. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3813-3817). IEEE.
(G) It is hard to open an undiscovered law in the form of the differential equation in modern physics. However, we could try to extract one from data without any preliminary assumptions to see if the computer way of thinking (modus ponens) is agreed with the human one. We teach the computer to discover physics parallel to the human way. It should inspire the scientist to find the new laws and the new ways to make the physics.
7. a full citation of the paper (that is, author names; title, publication date; name of journal, conference, or book in which article appeared; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable);
[1] Maslyaev, M., Hvatov, A., & Kalyuzhnaya, A. V. (2021). Partial differential equations discovery with EPDE framework: application for real and synthetic data. Journal of Computational Science, 101345.
[2] Maslyaev, M., & Hvatov, A. (2021, June). Multi-objective discovery of PDE systems using evolutionary approach. In 2021 IEEE Congress on Evolutionary Computation (CEC) (pp. 596-603). IEEE.
[3] Merezhnikov, M., & Hvatov, A. (2021). Multi-objective closed-form algebraic expressions discovery approach application to the synthetic time-series generation. Procedia Computer Science, 193, 285-294.
[4] Maslyaev, M., & Hvatov, A. (2022, June). Solver-Based Fitness Function for the Data-Driven Evolutionary Discovery of Partial Differential Equation. In 2022 IEEE Congress on Evolutionary Computation (CEC) (in press, unconditionally accepted).
8. a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors;
Any prize money, if any, is to be divided equally among all co-authors AH, MM, MM and AK.
9. a statement stating why the authors expect that their entry would be the "best"
Imagine the world where the ball equation of motion is discovered from the video recording. Oh, wait. We already live in this universe. Imagine the world where the equation is found using the arbitrary record with a single algorithm without any preliminary assumptions. We are close to it.
10. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GI (genetic improvement), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.
GP
11. The date of publication of each paper. If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is "in press" by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement.
[1] 2021, July
[2] 2021, June
[3] 2021, November
[4] Conference date: 18-23 July 2022