SCD Type 6 Implementation using Informatica PowerCenter

Johnson Cyriac Apr 24, 2013
|

SCD Type 6 Implementation using Informatica PowerCenter
In one of our prior articles we described the SCD Type 6 dimensional modeling technique. This technique is the combination of SCD Type1, Type 2 and Type 3, which gives much more flexibility in terms of the number of queries it can answer. But off course at the cost of complexity. In this article lets discuss the step by step implementation of SCD Type 6 using Informatica PowerCenter.
Slowly Changing Dimension Series
Part I : SCD Type 1.
Part II : SCD Type 2.
Part III : SCD Type 3.
Part IV : SCD Type 4.
Part V : SCD Type 6.

Understand the Staging and Dimension Table.

Staging Table

In our staging table, we have all the columns required for the dimension table attributes. So no other source tables other than Dimension table will be involved in the mapping. Below is the structure of our staging table.
SCD Type 6 Implementation using Informatica PowerCenter

Key Points :

  1. Staging table will have only one days data.
  2. Data is uniquely identified using CUST_ID.
  3. All attribute required by Dimension Table is available in the staging table.

Dimension Table

Here is the structure of our Dimension table.
image
Note : Be sure to define key for CUST_ID

Key Points :

  1. CUST_KEY is the surrogate key.
  2. CUST_ID, AS_OF_END_DT is the unique record identifier.
  3. Record versions are kept using Time Range using AS_OF_START_DT, AS_OF_END_DATE
  4. Active record will have an AS_OF_END_DATE value 12-31-4000

Mapping Building and Configuration

Now we understand the Staging Table, Dimension Table and the design considerations, we can go to the mapping development. We are splitting the mapping development into seven different steps.
  1. Join Staging Table and Dimension Table Using Lookup
  2. Data Transformation
    • Generate Surrogate Key
    • Other Calculations
  3. Identify Insert/Update
  4. Insert the new Records
  5. Update(Expire) the Old Version
  6. Insert the new Version of Updated Record
  7. Update the Type 6 Attribute 
Note : All the steps except, Step 7 will be same as SCD Type 2 Design

1. Join Staging Table and Dimension Table Using Lookup

We are going to join both the Staging (CUST_STAGE) table and the Dimension (T_DIM_CUST) table using Lookup Transformation. if it is there is no corresponding Customer in the Dimension table, it returns NULL values for the Dimension table column CUST_KEY.
  • LookUp Condition : IN_CUST_ID = CUST_ID
  • LookUp Filter ConditionAS_OF_END_DATE = TO_DATE('12-31-4000','MM-DD-YYYY')
Note : LookUp filter condition will fetch the latest version record from the dimension table.
SCD Type 6 Implementation using Informatica PowerCenter

2.  Data Transformation

Now map the columns from the Source Qualifier to an Expression Transformation.

Additionally map CUST_KEY from LookUp to OLD_CUST_KEY and map NEXTVAL from the sequence generator transformation to CUST_KEY

And add below expressions in the Expression Transformation
  • Generate Surrogate Key : A surrogate key will be generated for each and every record inserted in to the Dimension table
    • CUST_KEY : Is the surrogate key, This will be generated using a Sequence Generator Transformation
  • Other Calculations :
    • Effective Start Date : Effective start date of the Record
      • AS_OF_START_DT :  TRUNC(SYSDATE)
    • Effective end date  : Effective end date of the Record,   
      • AS_OF_END_DT : TO_DATE('12-31-4000','MM-DD-YYYY')
    • Record creation date : Record creation timestamp, this will be used for the records inserted
      • CREATE_DT :  TRUNC(SYSDATE)
    • Record updating date : Record updating timestamp, this will be used for records updated.
      • UPDATE_DT :  TRUNC(SYSDATE)
SCD Type 6 Implementation using Informatica PowerCenter

3. Identify Insert/Update

In this step we will identify the records for INSERT and UPDATE.
  • INSERT : A record will be set for INSERT if the record is not exist in the Dimension Table, We can identify the New records if  OLD_CUST_KEY is NULL, which is the column from the Dimension table
  • UPDATE : A record will be set for UPDATE, if the record is already existing in the Dimension table. If the column OLD_CUST_KEY is not null, the record will be set for UPADTE
    • Following expression will be used in the Expression Transformation port INS_UPD shown in the previous step
    • INS_UPD : IIF(ISNULL(OLD_CUST_KEY), 'I', IIF(NOT ISNULL(OLD_CUST_KEY), 'U'))
Now map all the columns from the Expression Transformation to a Router and add two groups as below
    • INSERT : IIF(INS_UPD_FLG = 'I', TRUE, FALSE)
    • UPDATE : IIF(INS_UPD_FLG = 'U', TRUE, FALSE)
SCD Type 6 Implementation using Informatica PowerCenter

4. Insert The new Records

Now map all the columns from the ‘INSERT’ group to the Dimension table instance T_DIM_CUST. While mapping the columns, we don’t need any column named OLD_, which is pulled from the Dimension table.

For all the inserted records both INCOME_GROUP and CURR_INCOME_GROUP (Type 6) will have the same value.
SCD Type 6 Implementation using Informatica PowerCenter

5. Update(Expire) the Old Version

The records which are identified for UPDATE will be expired by setting the AS_OF_END_DATE.

We will be mapping below columns from ‘UPDATE’ group of the Router Transformation to the target table. To update(expire) the old record we just need the columns below list.
    • OLD_CUST_KEY : To uniquely identify  the Dimension Record.
    • UPDATE_DATE : Audit column to know the record update date.
    • AS_OF_END_DT : Record will be expired with previous days date.
While we map the columns, AS_OF_END_DT will be calculated as ADD_TO_DATE(TRUNC(SYSDATE),'DD',-1) in an Expression Transformation. Below image gives the picture of the mapping.
SCD Type 6 Implementation using Informatica PowerCenter

6. Insert the new Version of Updated Record

The records which are identified as UPDATE will have to have a new(active) version inserted.  Map all the ports from the ‘UPDATE’ group of the Router Transformation to target instance T_DIM_CUST. While mapping the columns, we don’t need any column named OLD_, which is pulled from the Dimension table.

Both INCOME_GROUP and CURR_INCOME_GROUP (Type 6) will have the same value.
SCD Type 6 Implementation using Informatica PowerCenter

7. Update the Type 6 Attribute

Now CURR_INCOME_GROUP column needs to be updated with the same value for all the versions of Customer or CUST_ID with the latest value.

We will be mapping below columns from ‘UPDATE’ group of the Router Transformation to the target table.
  • CUST_ID : To identify all the Dimension records of a Customer .
  • INCOME_GROUP : Map to CURR_INCOME_GROUP.
  • UPDATE_DT : Audit column to know the record update date.
SCD Type 6 Implementation using Informatica PowerCenter
Note : Update will be based on the CUST_ID column, so that all the versions of Customer will get updated with same value.

We have completed the mapping development and below is the structure of the completed mapping.
SCD Type 6 Implementation using Informatica PowerCenter

Workflow and Session Creation

There is not any specific properties required to be given during the session configuration.
informatica worklet
Now lets look at the data, see how it looks from the below image. We can see that CURR_INCOME_GOUP is same for all the versions of customer Will Smith (CUST_ID = 1001).
imageHope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties implementing this.




About US Contact US Advertise Guest Post Terms and Conditions Privacy Policy Disclaimer

© 2012-2013 Data Intelligence Solution, All Rights Reserved
The contents in this site is copyrighted to Data intelligence Solution and may not be reproduced on other websites.