Comparing unsupervised probabilistic machine learning methods for market basket analysis

  • PDF / 1,048,043 Bytes
  • 31 Pages / 439.37 x 666.142 pts Page_size
  • 110 Downloads / 290 Views

DOWNLOAD

REPORT


Comparing unsupervised probabilistic machine learning methods for market basket analysis Harald Hruschka1  Received: 26 September 2018 / Accepted: 19 August 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract We compare several unsupervised probabilistic machine learning methods for market basket analysis, namely binary factor analysis, two topic models (latent Dirichlet allocation and the correlated topic model), the restricted Boltzmann machine and the deep belief net. After an overview of previous applications of unsupervised probabilistic machine learning methods to market basket analysis we shortly present the methods which we investigate and outline their estimation. Performance is measured by tenfold cross-validated log likelihood values. Binary factor analysis vastly outperforms topic models. The restricted Boltzmann machine attains a similar performance advantage over binary factor analysis. Overall, a deep belief net with 45 variables in the first and 15 variables in the second hidden layers turns out to be the best model. We also compare the investigated machine learning methods with respect to ease of interpretation and runtimes. In addition, we show how to interpret the relationships between hidden variables and observed category purchases. To demonstrate managerial implications we estimate the effect of promoting each category both on purchase probability increases of other product categories and the relative increase of basket size. Finally, we indicate several possibilities to extend restricted Boltzmann machines and deep belief nets for market basket analysis. Keywords  Machine learning · Market basket analysis · Factor analysis · Topic models · Restricted Boltzmann machine · Deep learning JEL Classification  M31 · L81 · D12 · C45 · C89

* Harald Hruschka [email protected] 1



University of Regensburg, Universitätsstrasse 31, 93040 Regensburg, Germany

13

Vol.:(0123456789)

H. Hruschka

1 Introduction Discovering cross-effects between purchases of different categories constitutes an important task of market basket analysis. Cross-effects may be caused by joint consumption (e.g., cake and whipping cream, sausage and rolls). Cross-effects may also be due to shopping habits. In this case categories are purchased together, though they are consumed independently (Betancourt and Gautschi 1990). Let us give two examples. Beer and bottled water may be consumed at different times or by different household members. Photos and tropical fruit may be purchased together simply to save time. Without cross-effects it is sufficient to analyze purchases by one-category choice models and to infer decisions for each category in an independent manner. But if sizable cross-effects exist, making decisions in each category without taking other categories into account is sub-optimal. Considering cross-category effects turns out to be especially relevant for promotion, assortment and store layout decisions. We adhere to a broad definition of promotion which is not restricted to price discount