Anikó Ekárt
Ph.D. Theses
Supervisor: Dr. András Márkus
Eötvös Loránd University
PhD Program in Informatics
Program Director: Prof. János Demetrovics
Computer and Automation Research Institute
Hungarian Academy of Sciences
Budapest 2001
1. Introduction
Nowadays we can witness the more and more growing interconnection between the research fields of genetics and computer science. The amazing results on human genome sequencing could not have been achieved without today's effective computers, the internet and the advanced data mining techniques. At the same time, both the concept and the working of artificial evolutionary systems imitate natural evolution.
Accordingly, on one side computers help natural genetics to develop, and on the other side, the lessons learned from natural evolution are applied in artificial evolutionary systems.
The field of evolutionary computation was formed in the 1970's when three independent research groups began to investigate the possibilities of creating artificial systems that solve certain problems through artificial evolution. However, the research in this area made a very rapid progress only in the last decade. Genetic programming is the newest form of evolutionary computation that was conceived in the late 1980's as a possible means for automatic programming. Genetic programming performs an evolutionary search in the space of computer programs and selects the program that solves a given task according to certain criteria.
The possibility of automatic programming has been a central question in computer science from the beginning. Genetic programming provides a partial answer. Genetic programming is an extension of genetic algorithms, where the structures undergoing adaptation are not strings but hierarchical computer programs of varying size and shape. Genetic programs are built from functions and terminals, referred together as nodes. Thus, the search space of genetic programming is the space of all trees that can be formed using the chosen function set and terminal set. Genetic programs are evaluated on a set of so-called fitness cases consisting of (input, output) pairs. Each program is tested against all the fitness cases and is assigned a fitness value accordingly. In a genetic programming system (like in every evolutionary method) one must carefully tune the representation, the evaluation method and the parameters in order to obtain worthwhile results.
Genetic programming raises many intriguing questions: How can one assure that evolution explores the best regions of the program space? How can one define and measure efficiently the diversity of programs? We do not know in advance the size and form of a good program. Then how can we restrict or guide the genetic search in order to keep it off from unnecessary extensive calculations? The benchmark problems seem quite simple. How can one apply the genetic programming paradigm to difficult, real world problems? In the case of complex tasks, one can encounter difficulties that did not cause much trouble in the case of simple tasks. How can the results on test problems be transmitted to more complex problems (i.e., is the method scalable)? Can genetic programming improve the performance of a system when added as a learning component?
In this
work we try to provide answers to the above questions.
2.
Outline of the dissertation
The dissertation is structured in three parts. In the first part we give an overview of evolutionary computation and in particular genetic programming. In Chapter 2 we give short descriptions for the different forms of evolutionary computation. Then we show in Chapter 3 the specific features of genetic programming. We raise key issues for genetic programming: code growth, diversity, real world applications.
In the second part we present our contribution to the theory of genetic programming. In Chapter 4 we demonstrate two methods for limiting the code growth. The first method consists in applying an additional mutation operator that simplifies the structure of a genetic program without altering its behavior. The second method applies multiobjective optimization for the objectives of fitness and program size. We show that both methods are successful in reducing code growth without significant loss of accuracy. In Chapter 5 we define a distance metric for genetic programs and use it for applying the fitness sharing technique.1 We propose a simple diversity measure based on our metric and study the effects of fitness sharing with the help of this diversity measure.
In the third part we show the application of genetic programming in two complex real world problems. The first problem comes from mechanical engineering. Four bar mechanisms play a very important role in practical mechanism design.2 We describe in Chapter 6 our four bar mechanism design system. We demonstrate how genetic programming can be a vital component of a complex design system. We integrate genetic programming with decision trees3 into a powerful learning machine.
The second
problem belongs to the decision support domain of economics.
The decision-makers have to make many subjective decisions. Consequently,
the final decision is sensitive to even small changes in these subjective
values. We present in Chapter 7 our genetic programming system that
helps the decision-makers to arrive at stable decisions. That is, for small
variations in the values of the involved variables, the final decision
remains unchanged.
3.
New results
The contributions of this work to the advance of genetic programming are two-fold: we present here both theoretical and application results. Our results can be grouped in four distinct categories, the first two categories summing up theoretical developments and the last two categories showing application results.
Thesis I
Two new methods for moderating code growth in genetic programming have been developed: the simplifying mutation method and the Pareto selection method.
The new methods have been compared with original genetic programming. Both methods reduced code growth and produced solutions of similar accuracy as those of original genetic programming. In the simplification method the mechanism for controlling code growth has been independent of the fitness-based selection mechanism. The Pareto method has been very simple and it has not included program size in fitness computation.
By using tournament selection and fitness sharing, we have maintained the diversity of the population at a certain level depending on the size of program neighborhood.
A. Ekárt: Machine Learning Methods in Synthesis of Four Bar Linkages. Conference of PhD Students in Computer Science, Szeged, 18-22 July 1998, pp. 38-39.
A. Ekárt: Controlling Code Growth in Genetic Programming by Mutation. W. B. Langdon, R. Poli, P. Nordin, T. Fogarty (Eds.). Late Breaking Papers of EUROGP'99, Göteborg, 26-27 May 1999, pp. 3-12.
A. Ekárt; A. Márkus: Decision Trees and Genetic Programming in Synthesis of Four Bar Mechanisms. Life Cycle Approaches to Production Systems, Proceedings of the Advanced Summer Institute-ASI'99, Leuven, 22-24 September 1999, pp. 201-208.
A. Ekárt: A General Framework for Obtaining Useful Design Features of Mechanisms. B. Katalinic (Editor). Annals of DAAAM for 1999, Proceedings of the 10th International DAAAM Symposium, Vienna, 21-23 October 1999, pp. 149-150.
A. Ekárt: Shorter Fitness Preserving Genetic Programs. C. Fonlupt, J.-K. Hao, E. Lutton, E. Ronald, M. Schoenauer, (Eds.). Artificial Evolution. 4th European Conference, AE'99 Dunkerque, 3-5 November 1999. Selected Papers, LNCS volume 1829, pp. 73-83 , 2000.
A. Ekárt; S. Z. Németh: A Metric for Genetic Programs and Fitness Sharing. R. Poli, W. Banzhaf, W. B. Langdon, J. Miller, P. Nordin, T. Fogarty (Eds.). Genetic Programming, Proceedings of EUROGP'2000, Edinburgh, 15-16 April 2000, LNCS volume 1802, pp. 259-270.
A. Ekárt; S. Z. Németh: Evolving Decision Principles for Multicriteria Decision Problems. Working Paper of the Laboratory of Operation Research and Decision Systems, WP 2000-9, November 2000.
A. Ekárt; S. Z. Németh: Selection Based on the Pareto Nondomination Criterion for Controlling Code Growth in Genetic Programming. Journal of Genetic Programming and Evolvable Machines, volume 2, issue 1, 2001, pp. 61-73.
A. Ekárt; S. Z. Németh: A Noncontinuous Generalization of the Arithmetic-Geometric Mean. Applied Mathematics and Computation, 2001 (In print).