Genetic Programming, Validation Sets, and Parsimon_动视

Genetic Programming, Validation Sets, and Parsimon

2025-10-05 05:18:20 责编:小OO

点击下载本文 文档为doc格式

Genetic Programming,Validation Sets,and

Parsimony Pressure

No Author Given

No Institute Given

Abstract.Fitness functions based on test cases are very common in

Genetic Programming(GP).This process can be assimilated to a learn-

ing task,with the inference of models from a limited number of samples.

This paper is an investigation on two methods to improve generalization

in GP-based learning:1)the selection of the best-of-run individuals using

a three data sets methodology,and2)the application of parsimony pres-

sure in order to reduce the complexity of the solutions.Results using GP

in a binary classiﬁcation setup show that while the accuracy on the test

sets is preserved,with less variances compared to baseline results,the

mean tree size obtained with the tested methods is signiﬁcantly reduced.

This paper is an experimental study of methodologies for Evolutionary Com-putations(EC)inspired by common practices in the Machine Learning(ML) and Pattern Recognition(PR)communities.More speciﬁcally,using Genetic Programming(GP)for supervised learning,we aim at evaluating both the eﬀect of using a three data sets methodology(training,validation,and test sets)and the eﬀect of minimizing the classiﬁers complexity.Our experiments show that these approaches preserve the performances of GP,while signiﬁcantly reducing the size of the best-of-run solutions,which is in accordance with Occam’s Razor principle.

The structure of the paper goes as follow.Section1starts with a high-level description of the tested approaches and their justiﬁcations.A presentation of relevant work follows in Section2.Thereafter,the methodology used in the experiments is detailed in Section3.Finally,Section4presents the experimental results obtained on six binary classiﬁcation data sets,and Section5concludes the paper.

1Introduction

GP is particularly suited for problems that can be assimilated to learning tasks, with the minimization of the error between the obtained and desired outputs for a limited number of test cases–the training data,using a ML terminology. Indeed,the classical GP examples of symbolic regression,boolean multiplexer and artiﬁcial ant[1]are only simple instances of well-known learning problems (i.e.respectively regression,binary classiﬁcation and reinforcement learning). In the early years of GP,these problems were tackled using a single data set,reporting results on the same data set that was used to evaluate theﬁtnesses during the evolution.This was justiﬁable by the fact that these are toy problems used only to illustrate the potential of GP.In the ML community,it is recognized that such methodology isﬂawed,given that the learning algorithm can overﬁt the data used during the training and perform poorly on unseen data of the same application domain[2,3].Hence,it is important to report results on a set of data that was not used during the learning stage.This is what we call in this paper a two data sets methodology,with a training set used by the learning algorithm and a test set used to report the performance of the algorithm on unseen data,which is a good indicator of the algorithm’s generalization(or robustness)capability. Even though this methodology has been widely accepted and applied in the ML and PR communities for a long time,the EC community still lags behind by publishing papers that are reporting results on data sets that were used during the evolution(training)phase.This methodological problem has already been spotted(see[4])and should be less and less common in the future.

The two data sets methodology prevents reportingﬂawed results of learn-ing algorithms that overﬁt the training set.But this does not prevent by itself overﬁtting the training set.A common approach is to add a third data set–the validation set–which helps the learning algorithm to measure its generaliza-tion capability.This validation set is useful to interrupt the learning algorithm when overﬁtting occurs and/or select a conﬁguration of the learning machine that maximizes the generalization performances.This third data set is com-monly used to train classiﬁers such as back-propagation neural networks and can be easily applied to EC-based learning.But this approach has an important drawback:it removes a signiﬁcant amount of data from the training set,which can be harmful to the learning process.Indeed,the richer the training set,the more representative it can be of the real data distribution,and the more the learning algorithm can be expected to converge toward robust solutions.In the light of these considerations,an objective of this paper is to investigate the eﬀect of a validation set to select the best-of-run individuals for a GP-based learning application.

Another concern of the ML and PR communities is to develop learning algo-rithms that generate simple solutions.An argument behind this is the Occam’s Razor principle,which states that between solutions of comparable quality,the simplest solutions must be preferred.Another argument is the minimum de-scription length principle[5],which states that the“best”model is the one that minimizes the amount of information needed to encode the model and the data given the model.Preference for simpler solutions and overﬁtting avoidance are closely related:it is more likely that a complex solution incorporates speciﬁc information from the training set,thus overﬁtting the training set,compared to a simpler solution.But,as mentioned in[6],this argumentation should be taken with care as too much emphasis on minimizing complexity can prevent the discovery of more complex yet more accurate solutions.

There is a strong link between the minimization of complexity in GP-based learning and the control of code bloat[1,7],that is an exaggerated growth ofprogram size in the course of GP runs.Even though complexity and code bloat are not exactly the same phenomenon,as some kind of bloat is generated by neutral pieces of code that have no eﬀect on the actual complexity of the solu-tions,most of the mechanisms proposed to control it[8,9,10,11]can also be used to minimize the complexity of solutions obtained by GP-based learning.

This paper is a study of GP viewed as a learning algorithm.More speciﬁ-cally,we investigate two techniques to increase the generalization performance and decrease the complexity of the models:1)use of a validation set to select best-of-run individuals that generalize well,and2)use of lexicographic parsi-mony pressure[10]to reduce the complexity of the generated models.These techniques are tested using a GP encoding for binary classiﬁcation problems, with vectors taken from the learning sets as terminals,and mathematical op-erations to manipulate these vectors as branches.This approach is tested on six diﬀerent data sets from the UCI ML repository[12].Even if the proposed techniques are tested in a speciﬁc context,we argue that they can be extended to the frequent situations where GP is used as a learning algorithm.

2Related Work

Some GP learning applications[13,14,15]have made use of a three data sets methodology,but without making a thorough analysis of its eﬀects.Panait and Luke[16]conducted some experiments on diﬀerent approaches to increase the robustness of the solutions generated by GP,using a three data sets methodol-ogy to evaluate the eﬃciency of each approach.Rowland[17]and Kushchu[18] conducted studies on generalization in EC and GP.Both of their argumenta-tions converge toward the testing of solutions in previously unseen situations for improving robustness.

Because of the bloat phenomenon,typical in GP,parsimony pressure has been more widely studied[9,19,20,21].In particular,several papers[22,23,24] have produced interesting results around the idea of using a parsimony pressure to increase the generalization capability of GP-evolved solutions.However,a counter-argumentation is given in[25],where solutions biased toward low com-plexity have,in some circumstances,increased generalization error.This is in accordance with the argumentation given in[6],which states that less complex solutions are not always more robust.

3Methodology

The experiments conducted in this work are based on a GP-setup specialized for binary classiﬁcation problems.The data processed by the primitives are vectors of two possible sizes,either of size one(a scalar value),or of size n,the feature set size.Table1presents the set of primitives used to build the programs.

Three main families of primitives were used:the mathematical function prim-itives(ADD,SUB,MUL,DIV,MXF,MNF,ABS,and SLN),the vector-to-scalar primitives(SUM,MEA,MXV,MIV,and L2),and the vectorial terminals(E

Table 1.GP primitives used to build the classiﬁers.Name #args.

Description ADD 2

Addition,f ADD (x 1,x 2)=x 1+x 2.SUB 2

Subtraction,f SUB (x 1,x 2)=x 1−x 2.MUL 2

Multiplication,f MUL (x 1,x 2)=x 1x 2.DIV

2Protected division,f DIV (x 1,x 2)= 1|x 2|<0.001x 1/x 2otherwise .MXF

2Maximum value,f MXF (x 1,x 2)=max(x 1,x 2).MNF

2Minimum value,f MNF (x 1,x 2)=min(x 1,x 2).ABS

1Absolute value,f ABS (x )=|x |.SLN

1Saturated symmetric linear function,f SLN (x )=8<:1x >1−1x <−1x otherwise .SUM

1Sum of vector’s components,f SUM (x )=P i x i .MEA

1Mean of vector’s components,f MEA (x )=P i x i x .MXV

1Maximum of vector’s components,f MXV (x )=max i x i .MIV

1Minimum of vector’s components,f MIV (x )=min i x i .L2

1L 2norm of the vector,f L2(x )=p P i x 2i .E

0Ephemeral random vector,generated by copying the value of a ran-domly selected training set data.X 0Vector with the value of the data to classify.

and X).The mathematical function primitives with two arguments (ADD,SUB,MUL,DIV,MXF,and MIF)are deﬁned to deal with arguments of diﬀerent sizes by applying the function to each component of the n -sized arguments,when necessary repeatedly using the value of the scalar arguments.More for-mally,if f (x 1,x 2)denotes the function associated to the primitive presented in Table 1,the output of these primitives is:

–A scalar [f(x 1(1),x 2(1))],if both arguments are scalars;

–A size n vector [f(x 1(1),x 2(1))f(x 1(1),x 2(2))...f(x 1(1),x 2(n ))]T ,if the ﬁrst argument is a scalar and the second a vector;

–A size n vector [f(x 1(1),x 2(1))f(x 1(2),x 2(1))...f(x 1(n ),x 2(1))]T ,if the ﬁrst argument is a vector and the second a scalar;

–A size n vector [f(x 1(1),x 2(1))f(x 1(2),x 2(2))...f(x 1(n ),x 2(n ))]T ,if both arguments are vectors.

On the other hand,the vector-to-scalar primitives are deﬁned to convert a vec-tor argument of size n into a scalar output.When the argument is a scalar,it is returned as output value as is,without modiﬁcation,except for the L2prim-itive which returns the absolute value of the input scalar.Finally,the vectorial terminals are always vectors of size n ,with either randomly selected data of the training set (terminal E),used as constants,or the value of the data to classify (terminal X),used as the variable of the problem.

set Size features Application domain

bcw6999Wisconcin’s breast cancer,65.5%benign and34.5%malignant. cmc14739Contraceptive method choice,42.7%not using contraception and

57.3%using contraception.

ger100024German credit approval,70%approved and30%not approved. ion35134Ionosphere radar signal,35.9%without structure detected and

.9%with a structure detected.

pid7688Pima Indians diabetes,65.1%tested negative and34.9%tested positive for diabetes.

spa460157Spam e-mail,60.6%non-junk e-mail and39.4%junk e-mail.

The data evaluated is classiﬁed according to the output of the GP tree,that is assigned to theﬁrst class for an output value positive or zero,otherwise assigned to the second class.If necessary,the output of the GP program is converted into a scalar beforehand,by a summation of each vector’s components,as does the primitive SUM.

In order to test the eﬀect of using a validation set and applying some par-simony pressure,GP will be tested on common binary classiﬁcation data sets taken from the Machine Learning Repository at UCI[12].The selected data set are presented in Table2.The selection of these data sets was guided by the fol-lowing main criteria:1)select appropriate sets for binary classiﬁcation,2)select appropriate sets for10-folds cross-validation(see below),that is data sets with-out predeﬁned separated training and testing sets,and3)select sets of relatively large size or high dimensionality.Theﬁrst two criteria were chosen in order to ﬁt into our general methodology,to avoid special data manipulations,while the last criterion was postulated in an attempt to select not too easy data sets,that should help to generate discriminant results.

Before the experiments,each data set was randomly divided into10folds of equal size,taking care to balance the number of data of each class between the folds.A10-folds cross-validation[2]has been conducted using the data in9 folds as the training set for an evolution,reporting the test set error rate on the remaining fold.For each tested conﬁguration,the process is repeated10times for each fold,for a total of100evolutions per conﬁguration.The reported results consist in the means for these100evolutions.

Our experimentations are conducted on four diﬀerent conﬁgurations:

1.Baseline:Theﬁtness measure consists in minimizing the error rate on the

complete training set.The best-of-run individual is simply the individual of the evolution with the lowest error rate on the training set,with the smallest individual selected in cases of ties.

2.With validation:For each evolution,the training set is randomly divided

into two data sets:theﬁtness evaluation data set,with67%of the trainingdata,and the validation set,with the remaining33%.The class distribution of the data is well-balanced between the sets.Theﬁtness measure consists in minimizing the error rate on theﬁtness evaluation set.At each generation,

a two-objective sort is conducted in order to extract a set of non-dominated

individuals(the Pareto front),with regards to the lowestﬁtness evaluation set error rate and the smallest individuals.These non-dominated individuals are then evaluated on the validation set,with the best-of-run individual selected as the one of these with the smallest error rate on the validation set,ties being solved by choosing the smallest individual.

3.With parsimony pressure:A lexicographic parsimony pressure[10]is ap-

plied to the evolution by minimizing the error rate on the complete training set,using the individual size as second point of comparison in cases of identi-cal error rates.As with the baseline conﬁguration,the best-of-run individual is the individual of the evolution with the lowest error rate on the training set,with the smallest individual selected in cases of ties(strict equality). 4.With validation and parsimony pressure:A mix of the two previous

conﬁgurations,by separating the training set into two sets,theﬁtness eval-uation set(67%of the data)and the validation set(33%of the data),and making use of the lexicographic parsimony pressure.Theﬁtness evaluation set is used to compute the error rate that guides the evolution while the validation set is used only to select the best-of-run individual.The selection of this best-of-run individual is identical to the with validation conﬁguration, by extracting a Pareto front of the non-dominated individuals of the current generation(ﬁtness evaluation set error rates vs individual sizes).At each generation,all these non-dominated individuals are tested on the validation set.The best-of-run individual is selected as the solution that minimizes the validation error rate,breaking ties by preferring the smallest individuals.

Thus,for the second and fourth settings,the Pareto front is extracted at each generation for testing against the validation set.This is motivated by two main reasons:1)it is important to reduce the number of solutions tested against the validation set,in order not to select best-of-run solutions that are just“by chance”performing well on the validation set,and2)it is desirable to test on the validation set a range of solutions with diﬀerent accuracy/size trade-oﬀs. It should be stressed that tournament selection is used in all evolutions,with lexicographic ranking for the third and fourth conﬁgurations.Strictly speaking, this is not a Pareto domination-based multi-objective selection algorithm.

Table3presents the GP parameters used during the experiments.No special tweaking of these parameter values was done,which correspond in most cases to the default values of the software tool used.The experimentations have been implemented using the GP facilities of the Open BEAGLE framework[26].

4Results

Table4presents the detailed results obtained by testing the four conﬁgurations presented in the previous section,using the six data sets of Table2.The error

Table3.Tableau of the GP evolutions.

Parameter Description and parameter values

Terminals and branches See Table1.

Population size One panmictic population of1000individuals.

Stop criterion Evolution ends after100generations.

Replacement strategy Genetic operations applied following generational scheme.

Selection Tournaments selection with2participants(relative ranking).

Fitness measure Without parsimony pressure:minimize the error rate.

With parsimony pressure:minimize the error rate and,

in case of ties,select the smallest individuals(lexicographic

ranking).

Crossover Classical subtree crossover[1](probability0.7).

Standard mutation Replace a subtree with a new randomly generated one(prob-

ability0.05).

Swap mutation Exchange a primitive with another of the same arity(proba-

bility0.05).

Shrink mutation Replace a branch with one of its children and remove the

branch mutated and the other children’s subtrees(if any)

(probability0.05).

Ephemerals mutation Randomly select a new ephemeral random vector(probability

0.05).

Reproduction Copy without modiﬁcation an existing individual(probability

0.1).

Data normalization The data of the diﬀerent sets are scaled in[−1,1]along the

diﬀerent dimensions.

rates and tree sizes that are reported consist in the mean and standard deviation values of the best-of-run individuals for the100runs(10diﬀerent runs for each folds).The eﬀort1consists in a measure of the computations done during the evolutions.It is calculated by summing the number of GP primitives evaluated during the runs.More precisely,for conﬁgurations without validation,the eﬀort is computed by counting in the number of primitives in each individual times the training set size,for all evaluated individuals during the run.For conﬁg-urations with validation,the size of the individuals on Pareto front times the validation set size is also taken into account.Italic results in Table4are not sta-tistically diﬀerent from the corresponding baseline result;hence all other results are statistically distinct from the baseline.

Figure1presents the box plots that stem from a one-way analysis of variance (ANOVA)conducted on the test set error rates.Looking at the results,it seems that no approach is clearly superior to the others in term of test set accuracy. But,taking a closer look we can see that the approach using both the validation set and parsimony pressure reduces the variance of the test set error rates(ﬁrst to 1Note that the notion of“eﬀort”presented here is diﬀerent from the one deﬁned by Koza in[1].Table4.Error rates,tree sizes and eﬀort for the evolution of GP-based classi-ﬁers using the UCI data sets.Results in italic are not statistically diﬀerent from those of the baseline conﬁguration,according to a95%conﬁdence two-tailed Stu-dent’s t-test.Results in bold are more than50%smaller than the corresponding baseline results.

Train set rate Valid.set rate Test set rate Tree size Eﬀort

Mean Std.Mean Std.Mean Std.Mean Std.Mean Stdev.

Approach error dev.error dev.error dev.size dev.(×109)(×109) Baseline1.7%0.5%––3.4%2.3%83.455.24.921.5 Validation2.3%0.7%2.3%0.8%3.3%2.3%34.238.84.081.2 Parsimony2.1%0.5%––3.5%2.3%22.018.91.100.8 Both2.8%0.7%2.7%1.0%3.3%2.1%6.511.20.720.6

cmc

Baseline26.4%2.4%––32.0%4.3%175.265.611.33.6 Validation28.9%3.2%30.9%3.0%32.9%4.1%101.9.18.302.6 Parsimony27.3%3.0%––32.4%4.7%150.068,69.654.0 Both29.5%3.0%29.5%3.1%32.6%4.3%59.436.56.071.9

ger

Baseline22.7%1.6%––29.4%3.8%170.273.07.332.5 Validation25.3%2.6%27.1%1.5%29.4%3.2%82.573.45.241.7 Parsimony22.5%1.6%––28.8%4.0%141.866.75.662.2 Both25.5%2.9%26.6%1.7%29.3%3.0%57.448.43.721.9

ion

Baseline4.1%1.2%––10.4%5.6%147.355.12.760.8 Validation5.5%2.5%7.2%2.8%10.6%6.6%95.055.42.050.6 Parsimony4.3%1.4%––9.8%6.2%82.335.81.880.6 Both7.7%3.0%7.4%2.7%10.8%6.3%41.930.51.100.4

pid

Baseline19.8%1.2%––25.3%4.7%151.358.35.551.6 Validation22.2%2.1%23.0%2.2%25.2%4.3%57.255.44.221.2 Parsimony20.1%1.1%––24.8%4.2%100.250.93.901.2 Both23.6%2.0%22.4%2.0%25.1%4.6%25.318.52.460.9

spa

Baseline12.5%2.1%––13.2%2.7%170.766.834..2 Validation12.7%2.4%13.5%2.8%14.0%2.7%152.457.522.15.7 Parsimony12.9%2.2%––14.0%2.7%147.254.929.09.0 Both13.2%2.2%13.5%2.2%13.8%2.6%105.448.618.97.5Fig.1.One-way analysis of variance(ANOVA)box plots of the best-of-run solutions test set error rates.The center box is bounded by theﬁrst and third quartiles of the data distribution,with the median as the central line in the box. The notches surrounding the median show the95%conﬁdence interval of this median.The whiskers above and below the boxes represent the spread of the data value within1.5times the interquartile range,with the+symbol showing outliers.

ion pid spa

third quartile range)for the bcw,ger,pid and spa data sets,having a comparable or slightly worse variance for the two other sets.This is an important result as getting replicable and stable solutions is often more interesting thanﬁnding only infrequently a marginally better individual.

Taking a closer look at the error rates on the diﬀerent sets in Table4,impor-tant diﬀerences can be noted between the train and validation set rates,on one hand,and the test set rates on the other hand.The diﬀerences between the train and test rates can be explained by an overﬁtting of the training data.But,it is surprising to see the importance of the diﬀerences between the validation and test rates.This may indicate that,because too many solutions are still tested against the validation set at each generation,the risk of selecting solutions that ﬁt the validation set“by chance”is not negligible.

Figure2presents the one-way ANOVA box plots for the best-of-run tree sizes. This time,it seems clear that the tested methods signiﬁcantly reduce the best-of-run individual tree sizes for all tested data sets.It is interesting to note that the conﬁgurations with a validation set have generated signiﬁcantly smaller best-of-run individual tree sizes compared with the parsimony pressure only approach.Fig.2.One-way analysis of variance(ANOVA)box plots of the best-of-run solutions tree sizes.

ion pid spa

This is expected given that the validation set is directly used in the best-of-run individual selection process,while the parsimony pressure is used only to limit the tree sizes during the runs.Also,the important size reduction of the best-of-run solutions,especially noticeable with the combination of validation and parsimony pressure,is valuable when simplicity or comprehensibility is necessary for the application at hand.Finally,taking a look at the mean eﬀort in Table 4,the reduction goes up to50%with the validation and parsimony pressure approach,compared to the baseline eﬀort.

5Conclusion

In this paper,methodologies were investigated to improve GP as a learning algo-rithm.More speciﬁcally,using the GP-based setup for binary classiﬁcation,the use of a validation set for selecting best-of-run individuals was tested,in order to pick solutions that generalize well.The eﬀect of a lexicographic parsimony pressure was also tested,in order to avoid unnecessary complexity in the evolved solutions.Experimental results indicate that the use of a validation set improves the stability of the best-of-run solutions on the test sets,by maintaining accu-racy while reducing variance.This is an important result given the stochastic nature of GP,which can introduce important variations of the results,from one run to another.Moreover,it was shown that mild parsimony pressure applied during evolutions can sustain performance in general,while eﬀectively reducingboth solution size and eﬀort.The combination of these two approaches appar-ently gives the best of both worlds,by reducing the variance of test set errors, simplifying drastically the complexity best-of-run solutions,and cutting down eﬀort by half.

As future work,still using a GP-based learning setup,it is planned to develop new stopping criteria based on the diﬀerence between training and validation set error rates.It is also planned to study the eﬀect of changing the test cases during the course of the evolution for GP-based learning,using methods such as competitive co-evolution and boosting.

References

1.Koza,J.R.:Genetic Programming:On the Programming of Computers by Means

of Natural Selection.MIT Press,Cambridge(MA),USA(1992)

2.Duda,R.O.,Hart,P.E.,Stork,D.G.:Pattern Classiﬁcation.Second edn.John

Wiley&Sons,Inc.,New York(NY),USA(2001)

3.Mitchell,T.M.:Machine Learning.McGraw-Hill(1997)

4.Eiben,A.E.,Jelasity,M.:A critical note on experimental research methodology

in EC.In:Proceedings of the2002Congress on Evolutionary Computation(CEC 2002),Honolulu(HI),USA,IEEE Press(2002)582–587

5.Rissanen,J.:Modeling by shortest data description.Automatica14(1978)465–

471

6.Domingos,P.:The role of occam’s razor in knowledge discovery.Data Mining and

Knowledge Discovery3(4)(1999)409–425

7.Banzhaf,W.,Langdon,W.B.:Some considerations on the reason for bloat.Genetic

Programming and Evolvable Machines3(1)(2002)81–91

8.Langdon,W.B.:Size fair and homologous tree genetic programming crossovers.

Genetic Programming and Evolvable Machines1(1/2)(2000)95–119

9.Ek´a rt,A.,N´e meth,S.Z.:Selection based on the pareto nondomination criterion

for controlling code growth in genetic programming.Genetic Programming and Evolvable Machines2(1)(2001)61–73

10.Luke,S.,Panait,L.:Lexicographic parsimony pressure.In:Proceedings of the

2002Genetic and Evolutionary Computation Conference(GECCO2002),New York(NY),USA,Morgan Kaufmann Publishers(2002)829–836

11.Silva,S.,Almeida,J.:Dynamic maximum tree depth.In:Proceedings of the2003

Genetic and Evolutionary Computation Conference(GECCO2003),Chicago(IL), USA,Springer-Verlag(2003)1776–1787

12.Newman,D.,Hettich,S.,Blake,C.,Merz,C.:UCI repository of machine learning

databases.http://www.ics.uci.edu/~mlearn/MLRepository.html(1998)

13.Sherrah,J.,Bogner,R.E.,Bouzerdoum,A.:The evolutionary pre-processor:Auto-

matic feature extraction for supervised classiﬁcation using genetic programming.

In:Genetic Programming1997:Proceedings of the Second Annual Conference, Stanford University(CA),USA,Morgan Kaufmann(1997)304–312

14.Brameier,M.,Banzhaf,W.:Evolving teams of predictors with linear genetic pro-

gramming.Genetic Programming and Evolvable Machines2(4)(2001)381–407 15.Yu,T.,Chen,S.H.,Kuo,T.W.:Discoveringﬁnancial technical trading rules using

genetic programming with lambda abstraction.In:Genetic Programming Theory and Practice II,Ann Arbor(MI),USA(2004)11–3016.Panait,L.,Luke,S.:Methods for evolving robust programs.In:Proceedings of the

2003Genetic and Evolutionary Computation Conference(GECCO2003).Volume 2724of LNCS.,Chicago(IL),USA,Springer-Verlag(2003)1740–1751

17.Rowland,J.J.:Generalisation and model selection in supervised learning with evo-

lutionary computation.In:EvoWorkshops2003.Volume2611of LNCS.,University of Essex,UK,Springer-Verlag(2003)119–130

18.Kushchu,I.:Genetic programming and evolutionary generalization.IEEE trans-

actions on Evolutionary Computation6(5)(2002)431–442

19.Nordin,P.,Banzhaf,W.:Complexity compression and evolution.In:Proceedings

of the Sixth International Conference Genetic Algorithms,Pittsburgh(PA),USA, Morgan Kaufmann(1995)310–317

20.Soule,T.,Foster,J.A.:Eﬀects of code growth and parsimony pressure on popula-

tions in genetic programming.Evolutionary Computation6(4)(1998)293–309 21.Gustafson,S.,Ekart,A.,Burke,E.,Kendall,G.:Problem diﬃculty and code

growth in genetic programming.Genetic Programming and Evolvable Machines 5(3)(2004)271–290

22.Iba,H.,de Garis,H.,Sato,T.:Genetic programming using a minimum descrip-

tion length principle.In:Advances in Genetic Programming.Complex Adaptive Systems,Cambridge(MA),USA,MIT Press(1994)265–284

23.Zhang,B.T.,M¨u hlenbein,H.:Balancing accuracy and parsimony in genetic pro-

gramming.Evolutionary Computation3(1)(1995)17–38

24.Rosca,J.:Generality versus size in genetic programming.In:Genetic Programming

1996:Proceedings of the First Annual Conference,Stanford University(CA),USA (1996)381–387

25.Cavaretta,M.J.,Chellapilla,K.:Data mining using genetic programming:The

implications of parsimony on generalization error.In:Proceedings of the1999 Congress on Evolutionary Computation(CEC1999),Washington(DC),USA (1999)1330–1337

26.Gagn´e,C.,Parizeau,M.:Open BEAGLE:A new versatile C++framework for

evolutionary computation.In:Late-Breaking Papers of the2002Genetic and Evo-lutionary Computation Conference(GECCO2002),New York(NY),USA(2002)下载本文

显示全文

全部频道