Learning with Kernels and Logical Representations

Choosing an appropriate kernel function is a fundamental step for the application of many popular statistical learning algorithms. Kernels are actually the natural entry point for inserting prior knowledge into the learning process. Inductive logic progra

  • PDF / 166,400 Bytes
  • 3 Pages / 430 x 660 pts Page_size
  • 1 Downloads / 204 Views

DOWNLOAD

REPORT


Abstract. Choosing an appropriate kernel function is a fundamental step for the application of many popular statistical learning algorithms. Kernels are actually the natural entry point for inserting prior knowledge into the learning process. Inductive logic programming (ILP), on the other hand, offers a powerful and flexible framework for describing existing background knowledge and extracting additional knowledge from the data. It therefore seems natural to explore the synergy between these two important paradigms of machine learning. In this extended abstract (see [1] for a longer version), I briefly review some of our recent work about statistical learning with kernel machines in the ILP setting.

1

Motivations

Statistical and logical approaches to machine learning offer complementary advantages. Logic allows us to represent domain knowledge in a natural and expressive way, and ILP can generate theories and explanations. Statistical learning, on the other hand, allows us to deal with uncertainty and noise in the data. Probabilistic inductive learning programming (PILP), also called statistical relational learning, is a very active area of research and several representational frameworks and models have been proposed during the last few years (see e.g. [2,3] for an overview). It essentially relies on the combined use of logic and probabilities in the learning process. One interesting distinction that is often made in statistical supervised learning is between generative and discriminant classifiers. In the former case, we typically attempt to model class conditional densities and use Bayes’ theorem to obtain the conditional probability of the output label given the input. In the latter case, one attempts to model conditional probabilities directly or, even more simply, to learn a discriminant function that consistently approximates the optimal decision function as the number of training examples grows to infinity. Several PILP approaches are based on generative learning. For example, stochastic logic programs are a generalization of probabilistic context free grammars that assign a probability to each definite clause in a logic program and allow us to infer the probability that a given goal is refuted. The approaches briefly reviewed here take the discriminant direction and exploit classic statistical supervised learning H. Blockeel et al. (Eds.): ILP 2007, LNAI 4894, pp. 1–3, 2008. c Springer-Verlag Berlin Heidelberg 2008 

2

P. Frasconi

algorithms based on kernel machines. Although several kernels have been defined on discrete data structures like strings, trees, and graphs, there are several motivations for studying the combination of kernels with logic: – Improving and facilitating kernel design. Background knowledge is usually plugged-in via the kernel function. We can use background knowledge expressed by logic programs and convert it into a kernel, thus embedding it into a statistical learning algorithm in a principled and flexible way. – Improving the accuracy and the efficiency of existing ILP systems, for ex