Change Data Capture (CDC) Implementation Using CHECKSUM Number
Typically we use a date column or a flag column to identify the change record for change data capture implementation. But there can be scenarios where you source do not have any columns to identify the changed records, especially when working with legacy systems. Today in this article lets see how to implement Change Data Capture or CDC for such scenarios using checksum number.

What is Checksum

A checksum is a value used to verify the integrity of a file or a data. Checksums are typically used to compare two sets of data to make sure they are the same. If the checksums don't match those of the original file or data, the data may have been altered.

How to find Checksum

Informatica provides the function MD5() for Checksum generation. This function returns a unique 32-character string of hexadecimal digits 0-9 and a-f.

Syntax : MD5( value )
Return : Unique 32-character string of hexadecimal digits 0-9 and a-f.

Informatica Implementation

Design Scenario

Lets consider a workflow to load CUSTOMER table from a  flat file, which is generated from legacy mainframe system. Any new customer information will be inserted and any changed customer information will be updated else rejected to bad file. Note that the source file do not have any indicator to identify the changed record.

Datamodel Needs

Apart from the customer attributes columns, we need to create an additional database table column to store the CHECKSUM number. It is 32 character hexadecimal value. So add the column to the CUSTOMER table and below is the target table definition.
Informatica PowerCenter Target Definition

Informatica Mapping

Lets start with the mapping. Create CHECKSUM using MD5() function in the expression as sown in below image.
    • MD5(CUST_NAME || ADDRESS1 || ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))
Informatica PowerCenter CheckSum MD5
Now create a LookUp transformation to get CUST_ID and CHK_SUM_NB from the target table. Use the LookUp Condition IN_CUST_ID = CUST_ID
Informatica PowerCenter LookUpTransformation
Now find out the records for INSERT, UPDATE using the columns from the lookUp Transformation with the expressions below.

INSERT : ISNULL(LKP_CUST_ID)
If the Customer is not existing in the target table, set the record for INSERT.

UPDATE : NOT ISNULL(LKP_CUST_ID) AND CHK_SUM_NB <> LKP_CHK_SUM_NB
If the Customer is existing in the target table and the Checksum of the source record is different from the lookup, set the record for UPDATE.

REJECT : Any other records not satisfying the above conditions will be passed on to the DEFAULT group and ignored.

Now add the Router Transformation with two groups with the expressions explained above.
Informatica PowerCenter Router Transformation
After the Router Transformation is added the mapping looks like in below image.
Informatica PowerCenter Router Transformation
Map the columns from INSERT and UPDATE group to both target instances, including CHK_SUM_NB to the target table.
Change Data Capture (CDC) Implementation Using CHECKSUM Number
Note : CHK_SUM_NB is Inserted and Updated into the target table, this value is used by the lookup to determine the insert and update.

All you left now is to create and run the workflow. Hope you enjoyed this tutorial. Please let me know if you have any questions or comments.

Share with your friends

Readers Comments