Dynamics and Performance of a Linear Genetic Programming System
Genetic Programming (“GP”) is a machine learning algorithm. Typically, Genetic Programming is a supervised learning algorithm, which trains on labeled training examples
provided by the user. The solution output by GP maps known attributes to the known labels.
Genetic Programming is distinctive from other machine learning algorithms in that its output is typically a computer program—hence “Genetic Programming.” The GP system
documented here conducts learning with a series of very simple selection and transformation steps—modeled loosely on biological evolution—repeated over-and-over on a population of evolving computer programs. The selection step mimics natural selection. The transformation steps—crossover and mutation—loosely mimic biological eucaryotic reproduction. Although the individual steps are simple, the dynamics of a GP run are complex.
This thesis traces key research elements in the design of a widely-used GP system. It also presents empirical comparisons of the GP system that resulted from these design
elements to other leading machine learning algorithms. Each of the issues addressed in this thesis touches on what was, at the time of publication, a key—and not well understood—
issue regarding the dynamics and behavior of GP runs. In particular, the design issues addressed here are threefold: (1) The emergence in GP runs of “introns” or “code bloat.” Introns in GP are segments of code that have no effect on the output of the program in which they appear. Introns are an emergent phenomenon in GP. This thesis reports results that support the hypothesis that Introns emerge as a means of protecting evolving programs against the destructive effect of the traditional GP crossover transformation
operator. (2) Mutation in biological reproduction is rare and usually destructive. However, we present results which establish that, in GP, using the mutation transformation
operator at a high level, generates better and more robust evolved programs than using the mutation transformation operator at levels more typical of biological evolution.
(3) Finally, we return to the GP crossover operator and present results that suggest that a “homologous” crossover operator produces better and more robust results than the
traditional GP crossover operator.
The GP system that resulted from the above research has been publicly available since 1998. It has been extensively tested and compared to other machine learning paradigms. This thesis presents results that demonstrate that the resulting GP system produces consistently high-quality and robust solutions when compared to Vapnick statistical regression, decision trees, and neural networks over a wide range of problem domains and problem types.
Linear Genetic Programming