Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

  • PDF / 1,392,387 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 34 Downloads / 156 Views

DOWNLOAD

REPORT


RESEARCH PAPER

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming Muhammad Quamber Ali1 · Hammad Majeed1  Received: 21 January 2019 / Revised: 16 December 2019 / Accepted: 25 January 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Genetic programming (GP) is a top performer in solving classification and clustering problems, in general and symbolic regression problems, in particular. GP has produced impressive results and has outperformed human generated results for 76 different problems taken from 22 different fields. There remain a number of significant open issues despite its impressive results. Among them are high computational cost, premature convergence and high error rate. These issues must be addressed for GP to realise its full potential. In this paper a simple and cost effective technique called Difficult First Strategy-GP (DFSGP) is proposed to address the aforementioned problems. The proposed technique involves pre-processing and sampling steps. In the pre-processing step, difficult to evolve data points by GP from the given data set are marked and in the sampling step, they are introduced in the evolutionary run by using two newly defined sampling techniques, called difficult points first and difficulty proportionate selection. These techniques are biased towards selecting difficult data points during the initial stage of a run and of easy points in the latter stage of a run. This ensures that GP does not ignore difficult-to-evolve data points during a run. Experiments have shown that GP coupled with DFS avoids premature convergence and attained higher fitness than standard GP using same fitness evaluations. Performance of the proposed technique was evaluated on three commonly known metrics, which are convergence speed, fitness and variance in the best results. Our results have shown that the proposed setups had achieved 10–15% better fitness values than Standard GP. Furthermore, the proposed setups had consistently generated better quality solutions on all the problems and utilized 30–50% less computations to match the best performance of Standard GP. Keywords  Difficult first strategy · Genetic programming · Pre-processing · Sampling techniques · Dataset sampling · Machine learning

1 Introduction Genetic programming (GP) is an evolutionary algorithm that is used to evolve expressions by exploring and picking the best relation between the functions and the terminals. The exploration is a supervised learning process and is guided by the difference between the expected output and the output of the evolved expressions (error). Generally, this error is computed on complete input data set. Some of the data points are easy to evolve by GP, such as points on a straight line as * Hammad Majeed [email protected] Muhammad Quamber Ali [email protected] 1



Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan

compared to data points on a curved line. In this wor