Initial History Building Algorithm for Slowly Changing Dimensions

DW Design , ETL Design

Building initial history for a Data Warehouse is a complex and time consuming task. It involve taking into account of all the date intervals from different source tables during which the source system’s representation of data in any of the tables feeding into the Dimension Tables. So we can imagine the history building complexity and the need of a reusable algorithm.

In this article lets see a history building algorithm, which can take care of the different history building scenarios.

History Building Date Scenarios.

Lets see the different date scenarios.

Single Source Scenario : The Dimension Table needs only one source table to populate all its data elements.
Multiple Source Scenarios : The Dimension Table needs multiple source tables to populate its data elements. Join conditions and ‘where’ conditions need to be applied to construct the Dimension Table row.
Start & End Date Scenarios : Source table records have both start and end date to represent the active time period of any particular row.
Start Date Scenarios : Source table records have only start date to represent the active time period of a particular row. This record is assumed to be active from the start date till the end of the source system existence.
No Date Scenarios : Source table records have no start or end date. This record is assumed to be active for the entire life of the source system..

The history building process involves the identification of date column interpret as system changes like create,update,delete dates. Those dates are the input, for the algorithm which we are discussing here.

The figure below illustrates how history dates in a potential scenario of multiple input tables and how the end result of the constructed record into Dimension Table will contain history date intervals that are more detailed than each individual source table.

Note : Points on each line shows different date buckets.

History Building Algorithm.

Lets understand the history building algorithm with a real time example.

Step 1 : Gather all the dates from different source tables and tag as S if the date is a start date or E if the date is a end date.
Note : Add a high date 12/31/2099 for each data source, which represents the still active record.

Step 2 : Sort Dates by date and type. Date on ascending and Type on descending.

Step 3 : Remove Duplicate Dates. Consider both date and type column to identify the duplicate rows.

Step 4 : Set End Date to next start, Use next rows date to build the end date for the current row.

Step 5 : Remove adjacent pairs. Remove start and end date pair, which is only one day apart.

Step 6 : Revise Dates. If End date = Next Start date, set it to next Start Date - 1

With this step we have all the time buckets created.

We can have this algorithm build into a reusable component, which can be used across different ETL Process.

Hope you enjoyed this tutorial, Please let us know if you have any questions and subscribe to the mailing list to get the latest tutorials in your mail box.

Popular Posts

Random Posts

Posts Being Viewed

Posts Category

Initial History Building Algorithm for Slowly Changing Dimensions

History Building Date Scenarios.

History Building Algorithm.

Share with your friends

Readers Comments