Strongly Possible Keys for SQL

  • PDF / 711,854 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 7 Downloads / 203 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Strongly Possible Keys for SQL Munqath Alattar1,3 · Attila Sali1,2 Received: 8 February 2019 / Revised: 14 October 2019 / Accepted: 8 August 2020 © The Author(s) 2020

Abstract Missing data value is an extensive problem in both research and industrial developers. Two general approaches are there to deal with the problem of missing values in databases; they could be either ignored (removed) or imputed (filled in) with new values (Farhangfar et al. in IEEE Trans Syst Man Cybern-Part A: Syst Hum 37(5):692–709, 2007). For some SQL tables, it is possible that some candidate key of the table is not null-free and this needs to be handled. Possible keys and certain keys to deal with this situation were introduced in Köhler et al. (VLDB J 25(4):571–596, 2016). In the present paper, we introduce an intermediate concept called strongly possible keys that is based on a data mining approach using only information already contained in the SQL table. A strongly possible key is a key that holds for some possible world which is obtained by replacing any occurrences of nulls with some values already appearing in the corresponding attributes. Implication among strongly possible keys is characterized, and Armstrong tables are constructed. An algorithm to verify a strongly possible key is given applying bipartite matching. Connection between matroid intersection problem and system of strongly possible keys is established. For the cases when no strongly possible keys hold, an approximation notion is introduced to calculate the closeness of any given set of attributes to be considered as a strongly possible key using the g3 measure, and we derive its component version g4 . Analytical comparisons are given between the two measures. Keywords Possible and certain keys · Strongly possible keys · Database · Null values · Data imputation · Armstrong tables · Implication problem

1 Introduction

Research was partially supported by the National Research, Development and Innovation Office (NKFIH) (Grants K-116769 and K-132696). This work is also supported by the National Research, Development and Innovation Fund (TUDFO/51757/2019-ITM, Thematic Excellence Program), the BME NC TKP2020 grant of NKFIH Hungary and it is also connected to the scientific program of the “Development of quality-oriented and harmonized R+D+I strategy and functional model at BME” project, supported by the New Hungary Development Plan (Project ID: TAMOP-4.2.1/B-09/1/KMR-2010-0002).

B

Attila Sali [email protected] Munqath Alattar [email protected]

1

Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Budapest, Hungary

2

Alfréd Rényi Institute of Mathematics, Budapest, Hungary

3

ITRDC, University of Kufa, Kufa, Iraq

Keys have always been fundamental for database management, in particular for understanding the structure and semantics of data. For a given collection of entities, a key is a set of attributes whose values enable us to uniquely identify each entity. A standard example is a relational