Data Integration Solutions: ETL Design

SCD Type 4, a Solution for Rapidly Changing Dimension

SCD Type 2, is design to generate new records for every change of a dimension attribute, so that complete historical changes can be tracked correctly. When we have dimension attributes which changes very frequently, the dimension grow very rapidly causing considerable performance and maintenance issues. In this article lets see how we can handle this rapidly changing dimension issue using SCD Type 4.

SCD Type 6 Implementation using Informatica PowerCenter

In one of our prior articles we described the SCD Type 6 dimensional modeling technique. This technique is the combination of SCD Type1, Type 2 and Type 3, which gives much more flexibility in terms of the number of queries it can answer. But off course at the cost of complexity. In this article lets discuss the step by step implementation of SCD Type 6 using Informatica PowerCenter.

Use Informatica Persistent Lookup Cache and Reduce Fact Table Load Time

Use Informatica Persistent Cache and Reduce Fact Table Load Time

In a matured data warehouse environment, it is very common to see fact tables with dozens of dimension tables linked to it. If we are using informatica to build this ETL process, we would expect to see dozens of lookup transformations as well; unless any other design techniques are used. Since lookup is the predominant transformation, turning this will help us gain some performance. Lets see how we can use persistent lookup cache for this performance improvement.

SCD Type 6, a Combination of SCD Type 1, 2 and 3

Slowly Changing Dimension Type 6 a Combination of SCD Type 1, 2 & 3

In couple of our previous articles, we discussed how to design and implement SCD Type1, Type 2 and Type 3. We always can not fulfill all the business requirements just by these basic SCD Types. So here lets see what is SCD Type 6 and what it offers beyond the basic SCD Types.

Re-Keying Surrogate Key For Dimension & Fact Tables. Need, Impact and Fix

A surrogate key is an artificial key that is used as a substitute for a natural key. Every surrogate key points to a dimension record, which represent the state of the dimension record at a point in time. We join between dimension tables and fact tables using surrogate keys to get the factual information at a point in time. In this article lets see the need of surrogate key re-keying, the impact of re-keying and possible fix.

Change Data Capture (CDC) Implementation for Multi Sourced ETL Processes

We have discussed couple of different options for Change Data Capture including a Change Data Capture Framework in our prior discussions. Implementing change capture for ETL process which involves multiple data source needs special care to capture changes from any of your data source. Here in this article lets see CDC implementation for ETL Process which involve multiple data sources.

Initial History Building Algorithm for Slowly Changing Dimensions

Building initial history for a Data Warehouse is a complex and time consuming task. It involve taking into account of all the date intervals from different source tables during which the source system’s representation of data in any of the tables feeding into the Dimension Tables. So we can imagine the history building complexity and the need of a reusable algorithm.

5 Restartability Design Pattern for Different Type ETL Loads

ETL Restartability design for informatica workflows

Restartable ETL jobs are very crucial to job failure recovery, supportability and data quality of any ETL System. So you need to build your ETL system around the ability to recover from abnormal ending of a job and restart. So a well designed ETL system should have a good restartable mechanism. In this article lets discuss ETL restartability approaches to support different type of ETL Jobs such as Dimension loads, Fact Loads etc...

SCD Type 1 Implementation using Informatica PowerCenter

Unlike SCD Type 2, Slowly Changing Dimension Type 1 do not preserve any history versions of data. This methodology overwrites old data with new data, and therefore stores only the most current information. In this article lets discuss the step by step implementation of SCD Type 1 using Informatica PowerCenter.

Design approach to Update Huge Tables Using Oracle MERGE

Design approach to Update Huge Tables in Informatica powercenter workflow

One of the issues we come across during the ETL design is "Update Large Tables". This is a very common ETL scenarion especially when you treat with large volume of data like loading an SCD Type 2 Dimension. We discussed about a design approach for this scenarion in one of our prior articles. Here in this updated article lets discuss a different approach to update Larger tables using Informatica Mapping.

SCD Type 3 Implementation using Informatica PowerCenter

Unlike SCD Type 2, Slowly Changing Dimension Type 3 preserves only few history versions of data, most of the time 'Current' and Previous' versions. The 'Previous' version value will be stored into the additional columns with in the same dimension record. In this article lets discuss the step by step implementation of SCD Type 3 using Informatica PowerCenter.

Change Data Capture (CDC) Implementation Using CHECKSUM Number

Typically we use a date column or a flag column to identify the change record for change data capture implementation. But there can be scenarios where you source do not have any columns to identify the changed records, especially when working with legacy systems. Today in this article lets see how to implement Change Data Capture or CDC for such scenarios using checksum number.

Data Cleansing and Standardization Using Regular Expression

Data Quality is one of the major priorities of any data warehouse or any data integration project. We use different tools for data quality and data standardization implementation. But tools may not be the right solution for small projects which involve couple of data feeds. Regular Expression is an alternative approach for such small projects. In this article lets discuss about data quality implementation using Regular Expression or RegEx in Informatica PowerCenter.

ACTIVE LookUp, To Unlock the Limitations of JOINER Transformation

Joiner Transformation can be used to achieve the functionality of SQL join Operation including full outer join. Additionally we can use Joiner to join data from heterogeneous data sources. But it is limited with the operators, which can be used in the join condition, it can use only 'equal to' operator in the join condition. In this article lets see how we can unlock this limitation using Informatica PowerCenter Active LookUp transformation.

11 Ways to Make Informatica PowerCenter Code Reusable

Reusability is a great feature in Informatica PowerCenter which can be used by developers. Its general purpose is to reduce unnecessary coding which ultimately reduces development time and increases supportability. In this article lets see different options available in Informatica PowerCenter to make your code reusable.

Popular Posts

Random Posts

Posts Being Viewed