Data Warehouse Concepts

This chapter introduces the basic concepts of data warehouses. A data warehouse is a particular database targeted toward decision support. It takes data from various operational databases and other data sources and transforms it into new structures that f

  • PDF / 1,180,442 Bytes
  • 35 Pages / 439.36 x 666.15 pts Page_size
  • 33 Downloads / 230 Views

DOWNLOAD

REPORT


Data Warehouse Concepts

This chapter introduces the basic concepts of data warehouses. A data warehouse is a particular database targeted toward decision support. It takes data from various operational databases and other data sources and transforms it into new structures that fit better for the task of performing business analysis. Data warehouses are based on a multidimensional model, where data are represented as hypercubes, with dimensions corresponding to the various business perspectives and cube cells containing the measures to be analyzed. In Sect. 3.1, we study the multidimensional model and present its main characteristics and components. Section 3.2 gives a detailed description of the most common operations for manipulating data cubes. In Sect. 3.3, we present the main characteristics of data warehouse systems and compare them against operational databases. The architecture of data warehouse systems is described in detail in Sect. 3.4. As we shall see, in addition to the data warehouse itself, data warehouse systems are composed of back-end tools, which extract data from the various sources to populate the warehouse, and front-end tools, which are used to extract the information from the warehouse and present it to users. In Sect. 3.5, we introduce the design methodology we will use throughout the book. We finish by describing in Sect. 3.6 two representative business intelligence suite of tools, SQL Server and Pentaho.

3.1

Multidimensional Model

The importance of data analysis has been steadily increasing from the early 1990s, as organizations in all sectors are being required to improve their decision-making processes in order to maintain their competitive advantage. Traditional database systems like the ones studied in Chap. 2 do not satisfy the requirements of data analysis. They are designed and tuned to support the daily operations of an organization, and their primary concern is to ensure A. Vaisman and E. Zim´ anyi, Data Warehouse Systems, Data-Centric Systems and Applications, DOI 10.1007/978-3-642-54655-6 3, © Springer-Verlag Berlin Heidelberg 2014

53

54

3

Data Warehouse Concepts

fast, concurrent access to data. This requires transaction processing and concurrency control capabilities, as well as recovery techniques that guarantee data consistency. These systems are known as operational databases or online transaction processing (OLTP) systems. The OLTP paradigm is focused on transactions. In the Northwind database example, a simple transaction could involve entering a new order, reserving the products ordered, and, if the reorder point has been reached, issuing a purchase order for the required products. Eventually, a user may want to know the status of a given order. If a database is indexed following one of the techniques described in the previous chapter, a typical OLTP query like the above would require accessing only a few records of the database (and normally will return a few tuples). Since OLTP systems must support heavy transaction loads, their design should prevent update anom