Extraction, Transformation, and Loading
Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. Since ETL processes are complex and costly, it is important
- PDF / 3,604,772 Bytes
- 43 Pages / 439.36 x 666.15 pts Page_size
- 86 Downloads / 227 Views
Extraction, Transformation, and Loading
Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. Since ETL processes are complex and costly, it is important to reduce their development and maintenance costs. Modeling ETL processes at a conceptual level is a way to achieve this goal. However, existing ETL tools, like Microsoft Integration Services or Pentaho Data Integration (also known as Kettle), have their own specific language to define ETL processes. Further, there is no agreed-upon conceptual model to specify such processes. In this chapter, we study the design of ETL processes using a conceptual approach. The model we use is based on the Business Process Modeling Notation (BPMN), a de facto standard for specifying business processes. The model provides a set of primitives that cover the requirements of frequently used ETL processes. Since BPMN is already used for specifying business processes, users already familiar with BPMN do not need to learn another language for defining ETL processes. Further, BPMN provides a conceptual and implementation-independent specification of such processes, which hides technical details and allows users and designers to focus on essential characteristics of such processes. Finally, ETL processes expressed in BPMN can be translated into executable specifications for ETL tools. We start this chapter with a brief introduction of BPMN, which we give in Sect. 8.1. Then, in Sect. 8.2, we explain how we can use BPMN for conceptual modeling of ETL processes. In Sect. 8.3, we apply these concepts to the Northwind case study. We design a conceptual model for the ETL process that loads the Northwind data warehouse used in the previous chapters with data extracted from the Northwind operational database and other sources. Finally, after providing in Sect. 8.4 a brief overview of Microsoft Integration Services and Pentaho Kettle, we show in Sects. 8.5 and 8.6 how the ETL conceptual model can be implemented in both tools. A detailed specification of the process is provided, and the differences between both implementation platforms are highlighted. A. Vaisman and E. Zim´ anyi, Data Warehouse Systems, Data-Centric Systems and Applications, DOI 10.1007/978-3-642-54655-6 8, © Springer-Verlag Berlin Heidelberg 2014
285
286
8.1
8
Extraction, Transformation, and Loading
Business Process Modeling Notation
A business process is a collection of related activities or tasks in an organization whose goal is to produce a specific service or product. A task can be performed by software systems, humans, or a combination of these. Business process modeling is the activity of representing the business processes of an organization so that the current processes may be analyzed and improved. Many techniques to model business processes have been proposed over the years. Traditional techniques include Gantt charts, flowcharts, PERT diagrams, and data flow diagrams. However, the p
Data Loading...