1. Titles of the Publication Introducing Phylogenetics in Search-based Software Engineering: Phylogenetics-aware SBSE. ACM Transactions on Software Engineering and Methodology. ----------------------------------------------------------------------------------------------------------------------- 2. the name, complete physical mailing address, e-mail address, and phone number of EACH author of the paper; Daniel Blasco 1 - dblasco@usj.es Antonio Iglesias 1 - aiglesias@usj.es Jorge Echeverria 1 - jecheverria@usj.es Francisca Perez 2 - franciscaperez@upv.es Carlos Cetina 2,3 - cetina@upv.es 1. Universidad San Jorge, Campus Universitario Villanueva de Gállego Autov. A-23 Zaragoza-Huesca, Km. 299 50830 Villanueva de Gállego – Zaragoza Tel: (+34) 976 060 100 2. Universitat Politècnica de València, Camino de Vera, s/n 46022 - Valencia Tel: (+34) 96 387 70 00 3. University College London, Gower Street, London, WC1E 6BT Tel: +44 (0) 20 7679 2000 ----------------------------------------------------------------------------------------------------------------------- 3. Corresponding author Carlos Cetina cetina@upv.es ----------------------------------------------------------------------------------------------------------------------- 4. Abstract of the paper Phylogenetics studies the relationships, in terms of biological history and kinship, of a set of taxa (e.g., species). We argue that in Search-based Software Engineering (SBSE), the individuals of an evolutionary computation-driven population could be considered as taxa for which the leverage of Phylogenetic Inference might be beneficial. In this work, we present our Phylogenetics-aware SBSE approach. Our approach introduces a novel Phylogenetic Operation to promote results which are sufficiently aligned (in terms of lineage) with a certain reference given by the domain expert. Our approach is evaluated in two heterogeneous industrial case studies: Procedural Content Generation from Game Software Engineering, and Feature Location from Software Maintenance. The results are analyzed using quality-of-the-solution and acceptance-by-developers measurements. We performed a statistical analysis to determine whether the impact on the results is significant compared to baselines that do not leverage Phylogenetics. The results show that our approach significantly outperforms two baselines in both case studies. Furthermore, two focus groups confirmed the acceptance of our approach and stressed that solution acceptance may make the difference in industrial environments. Our work has the potential to motivate a new breed of research work on Phylogenetics awareness to produce better results in Software Engineering. ----------------------------------------------------------------------------------------------------------------------- 5. Criteria that the author claims that the work satisfies (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. ----------------------------------------------------------------------------------------------------------------------- 6. Statement Why the Results Satisfy the Criteria Our approach achieves the best results in two heterogeneous industrial case studies: Feature Location from Software Maintenance, and Procedural Content Generation from Game Software Engineering. Feature Location is one of the fundamental tasks performed during the maintenance phase of a software product. It is a prerequisite for most software engineering tasks such as code refactoring, bug fixing, or variability management. There are many approaches available in the literature to perform feature location that have evolved over the years, which illustrates the complexity of the problem and the need for different approaches to address the various feature location scenarios. Since our approach outperforms (both in terms of quality and acceptance) two feature location baselines, we claim that (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. Our feature location results outperform manual feature location results previously reported in the literature (Comparing manual and automated feature location in conceptual models: A controlled experiment, Information and Software Technology, 2020). Therefore, we claim that (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. The complexity of developing video games has reached a point that exceeds human developers' capacity (endless delays and omnipresent bugs attest to this). The latest video game blockbuster Cyberpunk 2077 is a good example of both delays and bugs. The upcoming blockbuster GTA VI has already proven to be a good example of endless delays. Since our approach outperforms (both in terms of quality and acceptance) two video-game-content-generation baselines, we claim that (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. Our content generation results outperform manual content generation results previously reported in the literature (An evolutionary approach for generating software models: The case of Kromaia in Game Software Engineering, Journal of Systems and Software, 2021). Therefore, we claim that (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. ----------------------------------------------------------------------------------------------------------------------- 7. Full citation of the paper Daniel Blasco, Antonio Iglesias, Jorge Echeverría, Francisca Pérez, and Carlos Cetina. 2025. Introducing Phylogenetics in Search-based Software Engineering: Phylogenetics-aware SBSE. ACM Trans. Softw. Eng. Methodol. Just Accepted (January 2025). https://doi.org/10.1145/3715002 ----------------------------------------------------------------------------------------------------------------------- 8. Prize Money statement Any prize money, if any, is to be divided equally among the co-authors. ----------------------------------------------------------------------------------------------------------------------- 9. A Statement Indicating Why this Entry Could Be the "Best" For decades, Search-Based Software Engineering (SBSE) has advanced the state of the art in software engineering. SBSE has shown that many traditional software engineering problems can be reformulated as search problems where an evolutionary algorithm (for example) is guided by objectives to find an optimal (or near-optimal) solution. Many SBSE works acknowledge that the key ingredient to achieving the best results is the set of objectives guiding the search. However, in many cases, these objectives may not be sufficiently expressive, or adequately represented, to lead to results that are accepted by developers. These “hard-to-encode” optimization goals might somehow be present in references that can be used as inputs, and therefore, one approach is to evolve individuals with the additional goal of not being too different from the reference. Our work shows how to leverage phylogenetics from biology to achieve this. Our introduction of phylogenetics into SBSE leads to significantly better results not only in terms of quality but also in terms of developer acceptance. In fact, developers emphasize that the improvement in acceptance achieved through phylogenetics is the key to having results adopted in the industry. Developers explicitly state that they would not use results from state-of-the-art approaches, but they would use the results obtained through phylogenetics. It turns out that phylogenetics manages to capture relevant aspects for acceptance that have not been successfully captured by the state of the art over decades for essential software engineering tasks. Our work has the potential to benefit multiple domains and tasks in software engineering. In fact, our approach is evaluated in two heterogeneous industrial case studies: Procedural Content Generation (PCG) from Game Software Engineering, and Feature Location (FL) from Software Maintenance. PCG is one of the hot topics in video game development. FL can arguably be seen as one of the most frequent maintenance tasks undertaken by software developers (e.g., before making any modification to the software, they must locate the most relevant part to make the change). Our work is grounded in scientific rigor. We analyzed the results of each case study using quality and acceptance measurements. In the PCG case study, the quality measurements used are those accepted in Game Software Engineering research: Completion, Duration, Uncertainty, Killer Moves, Permanence, and Lead Change. In the FL case study, the quality measurements are retrieval measures accepted in Software Engineering research: Recall, Precision, and F-measure. In both case studies, acceptance by eight domain experts in total was analyzed using the Theory of Planned Behavior (TPB) questionnaire. TPB covers three dimensions: Attitude, Subjective Norm, and Perceived Behavioral Control. This questionnaire is well-suited to acceptance studies in the context of the case studies covered. The results obtained by our approach and the baselines are then compared using statistical analysis (𝑝-values, Â12, ANOVA, Kruskal-Wallis, or Eta-squared depending on the data). Finally, we conducted a focus group interview with the domain experts of each case study. We believe our idea of using phylogenetics could be a game-changer for SBSE and therefore for software engineering. By leveraging the vast body of knowledge accumulated in biological phylogenetics, we contribute to the main challenge of objective definition and achieve the most important aspect in terms of industry adoption: developer acceptance. Phylogenetics can help us better capture objectives that we have not been able to fully express during decades of research. ----------------------------------------------------------------------------------------------------------------------- 10. Evolutionary Computation Type Evolutionary Programming ----------------------------------------------------------------------------------------------------------------------- 11. Publication Date Online: 27 January 2025 Accepted: 16 January 2025