Building predictive models for direct mail: A framework for choosing training and test data
- PDF / 417,422 Bytes
- 8 Pages / 652 x 822 pts Page_size
- 84 Downloads / 197 Views
Tom Breur is senior consultant at the database marketing centre of Postbank in the Netherlands. He is professionally engaged in matters involving data mining, customer segmentation and database marketing methodology.
Leonard Paas worked on issues such as data mining, predictive modelling, controlling data quality, customer segmentation and credit scoring when he was a senior consultant at the database marketing centre of Postbank. Currently he is working on similar issues as a database marketing consultant at the Da Vinci Group. He is also working on a doctoral thesis at Tilburg University.
Abstract Predictive data mining models can be used to increase the return on investment of direct mail campaigns. In this paper the authors present a framework for choosing data sources for building and validating predictive data mining models. They propose a hierarchy which can be used to decide which behaviour is to be modelled when building and testing models. Choices made within this hierarchy depend on the cost and availability of relevant data and on campaign constraints.
Tom Breur Adelaarshorst 47, 5042 XH Tilburg, the Netherlands e-mail: [email protected].
INTRODUCTION When applying predictive data mining models for direct mail the goal is to select a segment of customers from the population that is considered the most desirable target group. The operational definition of desirability depends on the target measure used. Ideally, some measure close to bottom-line profitability is used. In general, a perfect measure of profitability is not feasible in practical business settings, but less sophisticated measures will be valuable as well.1 An example of a simple measure is the probability that a subject will respond to a direct mail offer. A somewhat more advanced measure would be to include a prediction of expected spending. Much has been written about different techniques for determining characteristics
of the best target group.2,3 Some possible techniques are algorithms for recursive partitioning, neural network computing, logit analysis or genetic algorithms. As the field of data mining has matured over the past decade, a wide choice of commercial software which uses the techniques mentioned above has become available. These data mining tools require a data set consisting of two types of variables: (1) a dependent variable that has the function of representing the behaviour that is to be predicted, the target variable (response behaviour for our purposes); and (2) an array of independent variables representing clients’ characteristics, which are used to predict which values subjects have on the target variable (explanatory variables). It is interesting that little attention has
䉷 Henry Stewart Publications 1350-2328 (2000)
Journal of Database Marketing
Vol. 8, 1, 9–16
9
Breur and Paas
been given to the development of a methodology for establishing which data can be used as the target variable in the model-building process. An outstanding book on data preparation was published in 1999,4 but it does not cove
Data Loading...