CityPlot: Colored ER Diagrams to Visualize Structure and Contents of Databases

  • PDF / 642,942 Bytes
  • 4 Pages / 595.276 x 790.866 pts Page_size
  • 13 Downloads / 143 Views

DOWNLOAD

REPORT


FA C H B E I T R A G

CityPlot: Colored ER Diagrams to Visualize Structure and Contents of Databases Martin Dugas · Gottfried Vossen

Received: 7 December 2011 / Accepted: 3 September 2012 / Published online: 13 September 2012 © Springer-Verlag 2012

Abstract CityPlot generates an extended version of a traditional entity-relationship diagram for a database. It is intended to provide a combined view of database structure and contents. The graphical output resembles the metaphor of a city. Data points are visualized according to data type and completeness. An open source reference implementation is available from http://cran.r-project.org/. Keywords Entity-relationship model · Database contents · Database visualization · Data completeness

1 Introduction Entity-Relationship (ER) models have long provided an abstract and conceptual representation of data [1]. ER models and derived ER diagrams are commonly used in database modeling [2, 3], due to their intuitive visualization and their independence from implementation details. Moreover, many extensions of the basic ER model have been proposed over the years, and methods as well as tools have been developed to exploit these models in practical database design [3–6]. Generally, ER diagrams show the conceptual schema of a database, but they do not visualize anything about the actual data points (records, tuples, objects) of a database in terms M. Dugas () Institute of Medical Informatics, University of Münster, Albert-Schweitzer-Campus 1, Gebäude A11, 48149 Münster, Germany e-mail: [email protected] G. Vossen Department of Information Systems, University of Münster, Leonardo-Campus 3, 48149 Münster, Germany e-mail: [email protected]

of the number of entries or completeness of data. The goal of this paper is to change this. During the software life cycle databases can change a lot over time: data records are inserted, updated and deleted; software updates can come along with major changes to the structure of the underlying database, e.g., involving new attributes and entities (and possibly new relationships). In addition, databases in various application domains can be very complex: 100+ attributes and 100.000+ records per single table are common in real-world databases; therefore methods and tools to provide an overview for such systems are needed. From a data analysis perspective, identification of data points suitable for further processing steps is of key importance. As a first step, data completeness needs to be assessed [7]. Especially when manual data entry is performed, as, for example, in many clinical applications of databases, the existence of an attribute in the database schema does not guarantee that data is being entered. For user acceptance reasons, enforcing non-NULL attributes is often impossible since not all values may be known when a tuple is inserted. Similar, in many automated data collection systems, the underlying schema contains considerably more attributes than are actually assigned values by, say, sensors or measuring instruments. Second