Initial History Building Algorithm for Slowly Changing Dimensions
Building initial history for a Data Warehouse is a complex and time consuming task. It involve taking into account of all the date intervals from different source tables during which the source system’s representation of data in any of the tables feeding into the Dimension Tables. So we can imagine the history building complexity and the need of a reusable algorithm.

In this article lets see a history building algorithm, which can take care of the different history building scenarios.

History Building Date Scenarios.

Lets see the different date scenarios. 
  1. Single Source Scenario : The Dimension Table needs only one source table to populate all its data elements. 
  2. Multiple Source Scenarios : The Dimension Table  needs multiple source tables to populate its data elements. Join conditions and ‘where’ conditions need to be applied to construct the Dimension Table row.
  3. Start & End Date Scenarios : Source table records have both start and end date to represent the active time period of any particular row.
  4. Start Date Scenarios : Source table records have only start date to represent the active time period of a particular row. This record is assumed to be active from the start date till the end of the source system existence.
  5. No Date Scenarios : Source table records have no start or end date. This record is assumed to be active for the entire life of the source system..
The history building process involves the identification of date column interpret as system changes like create,update,delete dates. Those dates are the input, for the algorithm which we are discussing here. 

The figure below illustrates how history dates in a potential scenario of multiple input tables and how the end result of the constructed record into Dimension Table will contain history date intervals that are more detailed than each individual source table.
History Building Algorithm
Note : Points on each line shows different date buckets. 

History Building Algorithm.

Lets understand the history building algorithm with a real time example.

Step 1Gather all the dates from different source tables and tag as S if the date is a start date or E if the date is a end date.
Note : Add a high date 12/31/2099 for each data source, which represents the still active record.
History Building Algorithm
Step 2 : Sort Dates by date and type. Date on ascending and Type on descending.
History Building Algorithm
Step 3 : Remove Duplicate Dates. Consider both date and type column to identify the duplicate rows.
History Building Algorithm
Step 4 : Set End Date to next start, Use next rows date to build the end date for the current row.
History Building Algorithm
Step 5 : Remove adjacent pairs. Remove start and end date pair, which is only one day apart.
History Building Algorithm
Step 6 : Revise Dates. If End date = Next Start date, set it to next Start Date - 1
History Building Algorithm
With this step we have all the time buckets created.

We can have this algorithm build into a reusable component, which can be used across different ETL Process.

Hope you enjoyed this tutorial, Please let us know if you have any questions and subscribe to the mailing list to get the latest tutorials in your mail box.

Share with your friends

Readers Comments