HUMIES 2023 Application

Kindly find the requested information in the list below:
 
1. 	The complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result;

Fully Autonomous Programming with Large Language Models
 
2. 	The name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s);

Vadim Liventsev
Address: PO Box 51, 5600 MB Eindhoven, Netherlands
Email: v.liventsev@tue.nl
Tel.: +31639818995


Anastasiia Grishina
Address: Simula Research Laboratory, Kristian Augusts gate 23, 0164 Oslo, Norway
Email: anastasiia@simula.no
Tel.: +47 92 05 65 91


Aki Härmä
Address: High Tech Campus 34, 5656AE, Eindhoven, The Netherlands    
Email: aki.harma@philips.com
Tel: +31 645792431K


Leon Moonen
Address: Simula Research Laboratory, Kristian Augusts gate 23, 0164 Oslo, Norway
Email: leon.moonen@computer.org
Tel.: +47 926 62 474
 
3. 	The name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition);

Anastasiia Grishina
 
4. 	The abstract of the paper(s);

Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the wrong input or output format. This calls for an approach known as Synthesize, Execute, Debug (SED), whereby a draft of the solution is generated first, followed by a program repair phase addressing the failed tests. To effectively apply this approach to instruction-driven LLMs, one needs to determine which prompts perform best as instructions for LLMs, as well as strike a balance between repairing unsuccessful programs and replacing them with newly generated ones. We explore these trade-offs empirically, comparing replace-focused, repair-focused, and hybrid debug strategies, as well as different template-based and model-based prompt-generation techniques. We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation. The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches.
 
5. 	A list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies:

(B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.
(D) The result is publishable in its own right as a new scientific result  independent of the fact that the result was mechanically created.
(G) The result solves a problem of indisputable difficulty in its field.
 
6. 	A statement stating why the result satisfies the criteria that the contestant claims;

(B): The developed framework, SEIDR, outperforms the PushGP baseline on PSB2 in Python and performs on par with it in C++ experiments. 

(D): With the help of the mixture of large language models and parent selection (beam search, in the experiments), SEIDR allows users to balance between repairing unsuccessful programs and replacing them with newly generated ones. Thereby, the framework enables replace-focused, repair-focused, and hybrid debug strategies, as well as the use of different template-based and model-based prompt-generation techniques. The work received positive feedback from reviewers, particularly confirming the novelty: “the core idea is interesting, novel, and has potential to impact this research field.”

(G): SEIDR solves the last mile problem of genetic programming through iterative updates with execution feedback. The last mile problem constitutes in the fact that generated programs tend to be superficially similar to correct programs but do not compile or pass tests.   
 
7. 	A full citation of the paper (that is, author names; title, publication date; name of journal, conference, or book in which article appeared; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable);

Vadim Liventsev, Anastasiia Grishina, Aki Härmä, and Leon Moonen. 2023.
Fully Autonomous Programming with Large Language Models. In Genetic and Evolutionary Computation Conference (GECCO ’23), July 15–19, 2023, Lisbon, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3583131.3590481
 
8. 	A statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors;

The prize money, if any, is to be divided according to the share (in %) between respectively
V. Liventsev, A. Grishina, A. Härmä, and L. Moonen: 35 / 35 / 0 / 30.
 
9. 	A statement stating why the authors expect that their entry would be the "best":

Our submission is a vivid illustration of how genetic programming can meet the latest developments in generative language models. We aspire it will encourage an even stronger alliance between the fields. 
Moreover, SEIDR is a versatile tool. Most importantly, it can be used as an add-on to any existing program synthesis method to further improve or perfect the initially synthesized program until it satisfies the requirements specified as I/O examples. Users can employ SEIDR in this manner by replacing SYNTHESIZE block with their program synthesis algorithm.
 
10.  An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GI (genetic improvement), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.

GP (genetic programming), GI (genetic improvement)


11.  The date of publication of each paper.  If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is “in press” by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement.

July 15-19 2023, GECCO 2023
Kindly see the confirmation of the acceptance in a separate document.