Re-Keying Surrogate Key For Dimension & Fact Tables. Need, Impact and Fix
A surrogate key is an artificial key that is used as a substitute for a natural key. Every surrogate key points to a dimension record, which represent the state of the dimension record at a point in time. We join between dimension tables and fact tables using surrogate keys to get the factual information at a point in time. In this article lets see the need of surrogate key re-keying, the impact of re-keying and possible fix.

Need and Impact of Surrogate Key Re-Keying

Typically we never re-generate or re-key surrogate key, just because of the fact that these keys links between dimension and fact records to represent the state of factual data at a point in time. At times we come across situations which can not avoid re-keying.

Lets consider an SCD Type 1 customer dimension, which stores the basic customer information  and customer income group. And a Fact table sales fact. 
Re-Keying Surrogate Key For Dimension and Fact Tables. The Need, Impact and Fix
Here CUST_DIM is not keeping the historical changes of customer attributes. From this data we cannot do an analysis no how the sales per customer changed, when the income group is changed. So business users decided to keep track of the customer attributes historical changes in an SCD Type 2.

This change in turn creates more records for each customers by adjusting the as of start and as of end date for many customer records. Here the CUST_ID 672 changed his income group from MEDIUM to HIGH, so we have two records, with surrogate key CUST_SKEY 101 and 301. One  (301) effective till 25-July-12 and the other (101) is still active.

Changed values for both records of CUST_ID 672 is high lighted in red.
Re-Keying Surrogate Key For Dimension and Fact Tables. The Need, Impact and Fix
This change alone for the Dimension table will not give the capability of historical analysis. We will have to update the Fact  table to refer the correct historical Dimension record. Below shown is the correct reference from Fact table to the Dimension record.

We can imagine how painful it will be to adjust the surrogate keys for a Fact or Dimension table having millions of records. Corrected surrogate Keys are highlighted in red in below image.
Re-Keying Surrogate Key For Dimension and Fact Tables. The Need, Impact and Fix

Fix for Surrogate Key Re-Keying

By now we know the complexity involved in the re-keying of surrogate key. Lets try to find the high level steps involved in fixing the issue.

Dimension Table

We are not left with not much option other than recreating the Dimension table, which will involve the history building retro effectively.  To reduce the impact of Dimension rebuilding, we can build the dimension into a temporary table and finally convert the temporary table to the actual Dimension table.

Fact Table

Fact table can be rebuild from the source tables as long as the historical source data is available. Special care should be given to make sure that each fact record is pointing to the surrogate key, which is in effect for the time period of fact creation.

If the historical source data is not available, we can use the existing data from Fact table to derive the new re-keyed fact table. Along with the existing Fact table join with the existing Dimension table to get the natural key and in turn join it with the new re-keyed dimension table to get the surrogate key.

To reduce the impact of Fact rebuilding, we can build the Fact into a temporary table and finally convert the temporary table to the actual Fact table.

Hope you enjoyed this tutorial, Please let us know if you if you have experienced re-keying crisis and how you handled the situation. We are happy to hear from you.

Share with your friends

Readers Comments