User-System Interaction for Redundancy-Free Knowledge Discovery in Data

A classical limit of association rule at the decider's point of view is in the combinatorial nature of the association rules, resulting in numerous rules. As the overall quality of an association rule set can be considered as insight of the studied domain

  • PDF / 1,179,092 Bytes
  • 17 Pages / 595.276 x 841.89 pts (A4) Page_size
  • 21 Downloads / 187 Views

DOWNLOAD

REPORT


ion The amount of collected data grows continuously. Decision tasks performed must take this growth into account to deal with prediction, action evaluation or validation, in the context of a large variety of application fields like management, profit optimization or analysis. The KDD (Knowledge Discovery in Databases) area scopes this range of applications in the goal of providing automated tools and adapted data representations to help an expert user in finding the evidences needed for the decision tasks. This assumes a human centered KDD process. As a human centered process involving automated procedures, it needs a targetted problem representations that are both realistic from the user’s point of view and computable from a machine point of view. R. Lehn et al.: User-System Interaction for Redundancy-Free Knowledge Discovery in Data, Studies in Computational Intelligence (SCI) 127, 463–479 (2008) www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008

464

R. Lehn et al.

Among KDD techniques, association rules [2] allow the capture and the representation of implicative patterns that tolerate a small set of counterexamples —e.g. birds that cannot fly or sport cars that are not red. Association rules can be enhanced with statistical evaluations and filters such as the Intensity of Implication family of indices. Association rule discovery is motivated by the exploitation of operational databases to discover a new knowledge, that was unknown before the discovery and that is potentially exploitable in a decision making process [19]. Many performant algorithms have been published to optimize the association rules search [8, 16] but they mainly focus on algorithmic optimization rather than on knowledge usability. One of the fundamental hypothesis of association rule discovery is that the user does not specify the goal of the search. Because of the intrinsically combinatorial nature of the search and the lack of the goals, the classical use of these algorithms, chaining data selection, data formatting, frequent sets induction, rules calculation and rule presentation to the user, generally outputs quantities of rules, without order of any kind, which is in contradiction with the principle of knowledge readability and usability for a decision process. Experiments using a direct application of association rules algorithms like A Priori, resulted in thousands of rules. We can then seriously contest the quality of the vision of the studied domain provided by the association rules to the user if he has to explore thousands of rules. We can contest as well the quality of the induction itself if the energy that the user has to involve to interpret the association rules is nearly the same as the energy he would have to deploy to get the same domain understanding by directly browsing the database. A classical answer to this problem is to set high thresholds on quality indices that evaluate individual rules, to eliminate the least pertinent rules as measured by these indices. But there are cases where this strategy cannot be