Predictive Model Based on the Evidence Theory for Assessing Critical Micelle Concentration Property

In this paper, we introduce an uncertain data mining driven model for knowledge discovery in chemical database. We aim at discovering relationship between molecule characteristics and properties using uncertain data mining tools. In fact, we intend to pre

  • PDF / 569,439 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 15 Downloads / 150 Views

DOWNLOAD

REPORT


3

Sorbonne University, Universit´e de technologie de Compi`egne, CNRS, UMR 7338 Biomechanics and Bioengineering, Compi`egne, France [email protected] 2 Sorbonne University, Universit´e de technologie de Compi`egne, EA 4297 Transformations Int´egr´ees de la Mati`ere Renouvelable, Compi`egne, France Universit´e de Picardie Jules Verne, CNRS, FRE 3517 Laboratoire de Glycochimie, des Antimicrobiens et des Agroressources, Amiens, France

Abstract. In this paper, we introduce an uncertain data mining driven model for knowledge discovery in chemical database. We aim at discovering relationship between molecule characteristics and properties using uncertain data mining tools. In fact, we intend to predict the Critical Micelle Concentration (CMC) property based on a molecule characteristics. To do so, we develop a likelihood-based belief function modelling approach to construct evidential database. Then, a mining process is developed to discover valid association rules. The prediction is performed using association rule fusion technique. Experiments were conducted using a real-world chemical databases. Performance analysis showed a better prediction outcome for our proposed approach in comparison with several literature-based methods.

Keywords: Evidential data mining rule · Associative classifier

1

· Chemical database · Association

Introduction

Data mining is generally held to be generically a discipline of the field of Knowledge Discovery, or Knowledge Discovery in Databases (KDD). It is usually defined as the process of identifying valid, novel, potentially useful, and ultimately understandable patterns from large collections of data. Then, causal rules are derived from those patterns. Frequent patterns and valid rules can be used to test hypotheses (or verification goals) or to autonomously find entirely new patterns (discovery goals) [1]. Discovery goals could be predictive (requiring predictions to be made using the data in the database) [2]. On the other hand, there c Springer International Publishing Switzerland 2016  J.P. Carvalho et al. (Eds.): IPMU 2016, Part I, CCIS 610, pp. 510–522, 2016. DOI: 10.1007/978-3-319-40596-4 43

Critical Micelle Concentration Property Prediction with Evidence Theory

511

has been an explosion in the availability of publicly accessible chemical information, including chemical structures of small molecules, structure-derived properties and associated biological activities in a variety of assays [3,4]. These data sources provide a significant opportunity to develop and apply computational tools to extract and understand the underlying structure-activity relationships. These techniques remain sensitive to the presence of imperfect data [5]. Recent years, we have noticed the emergence of uncertain data mining tools [6–8] that contribute to seek hidden pertinent information under the presence of uncertainty and imprecision. However, to the best of our knowledge, uncertain data mining tools have not yet been used to discover pertinent knowledges neither to predict in chemical databases. I