Data Integration Solutions: ETL Design

An ETL Parameter Framework to Deal with all sorts of Parametrization Needs

Informatica Cloud Mapping Tutorial for Beginners

We spoke about different etl frameworks in our prior articles. Here in this article lets talk about an ETL framework to deal with parameters we normally use in different ETL jobs and different use cases. Using parametrization in the ETL code increases code reusability, code maintainability and is critical to the quality of the code and reduces the development cycle time.

Informatica Incremental Aggregation Implementation and Business Use Cases

Informatica PowerCenter Incrimental Aggregation

Incremental Aggregation is the perfect performance improvement technique to implement; when you have to do aggregate calculations on your incrementally changing source data. Rather than forcing the session to process the entire source data and recalculate the same data each time you run the session, incremental aggregation persist the aggregated value and adds the incremental changes to it. Lets see more details in this article.

How to Use Error Handling Options and Techniques in Informatica PowerCenter

Data quality is very critical to the success of every data warehouse projects. So ETL Architects and Data Architects spent a lot of time defining the error handling approach. Informatica PowerCenter is given with a set of options to take care of the error handling in your ETL Jobs. In this article, lets see how do we leverage the PowerCenter options to handle your exceptions.

How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings

Many Informatica PowerCenter developers tend to use SQL Override during mapping development. Developers finds it easy and more productive to use SQL Override. At the same time ETL Architects do not like SQL Overrides as it hide the ETL logic from metadata manager. In this article lets see the options available to avoid SQL Override in different transformations.

Informatica PowerCenter Design Best Practices and Guidelines

A high-level systematic ETL design will help to build efficient and flexible ETL processes. So special care should be given in the design phase of your project. In following we will be covering the key points one should keep in mind while designing an ETL process. The following recommendations can be integrated into your ETL design and development processes to simplify the effort and improve the overall quality of the finished product.

Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts

In the typical case for a data warehouse, dimensions are processed first and the facts are loaded later, with the assumption that all required dimension data is already in place. This may not be true in all cases because of nature of your business process or the source application behavior. Fact data also, can be sent from the source application to the warehouse way later than the actual fact data is created. In this article lets discusses several options for handling late arriving dimension and Facts.

SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse

Informatica Performance Tuning Guide, Performance Enhancements - Part 4

In our couple of prior articles we spoke about change data capture, different techniques to capture change data and a change data capture frame work as well. In this article we will deep dive into different aspects for change data in Data Warehouse including soft and hard deletions in source systems.

Surrogate Key Generation Approaches Using Informatica PowerCenter

Different Approaches to Generate Surrogate Key in Informatica PowerCenter

Surrogate Key is sequentially generated unique number attached with each and every record in a Dimension table in any Data Warehouse. We discussed about Surrogate Key in in detail in our previous article. Here in this article we will concentrate on different approaches to generate Surrogate Key for different type ETL process.

Surrogate Key in Data Warehouse, What, When and Why

Surrogate Key in Data Warehouse, What, When, Why and Why Not

Surrogate keys are widely used and accepted design standard in data warehouses. It is sequentially generated unique number attached with each and every record in a Dimension table in any Data Warehouse. It join between the fact and dimension tables and is necessary to handle changes in dimension table attributes.

Informatica PowerCenter on Grid for Greater Performance and Scalability

Informatica PowerCenter Workflows on Grid for Performance and Scalability

Informatica has developed a solution that leverages the power of grid computing for greater data integration scalability and performance. The grid option delivers the load balancing, dynamic partitioning, parallel processing and high availability to ensure optimal scalability, performance and reliability. In this article lets discuss how to setup Infrmatica Workflow to run on grid.

Time Zones Conversion and Standardization Using Informatica PowerCenter

When your data warehouse is sourcing data from multi-time zoned data sources, it is recommended to capture a universal standard time, as well as local times. Same goes with transactions involving multiple currencies. This design enables analysis on the local time along with the universal standard time. The time standardization will be done as part of the ETL, which loads the warehouse. In this article lets discuss about the implementation using Informatica PowerCenter.

Dynamically Changing ETL Calculations Using Informatica Mapping Variable

Quite often we deal with ETL logic, which is very dynamic in nature. Such as a discount calculation which changes every month or a special weekend only logic. There is a lot of practical difficulty in making such frequent ETL change into production environment. Best option to deal with this dynamic scenario is parametrization. In this article let discuss how we can make the ETL calculations dynamic.

Informatica Workflow Recovery with High Availability for Auto Restartable Jobs

Restartable ETL jobs are very crucial to job failure recovery, supportability and data quality of any ETL system. In one of our prior articles we discussed different design techniques for ETL restartability, independent of the ETL tool used. We can also implement restartability in an ETL job using Informatica PowerCenter workflow recovery capabilities. In this article lets see what is required to setup an informatica workflow for recovery.

User Defined Functions in Informatica PowerCenter

Re-Runnability for ETL Processes Which Uses Mapping Variables

Reusability is a great feature in Informatica PowerCenter which can be used by developers. Its general purpose is to reduce unnecessary coding which ultimately reduces development time and increases supportability. User Defined Functions is one among the reusability feature provided by Informatica PowerCenter. In this article lets understand User Defined Functions in detail.

Re-Runnability for Informatica ETL Processes Which Uses Mapping Variables

Informatica PowerCenter mapping variable can be effectively used to implement change data capture logic. The mapping variable is stored in the repository and its value is set to the new value only when the session execution is successful. This property makes the restartability of the ETL process easy. Since the last success point - variable value, is stored in repository, we cannot go back and reprocess an already processed data set. In this article lets see how we can handle overriding the mapping variable value.

SCD Type 4, a Solution for Rapidly Changing Dimension

SCD Type 2, is design to generate new records for every change of a dimension attribute, so that complete historical changes can be tracked correctly. When we have dimension attributes which changes very frequently, the dimension grow very rapidly causing considerable performance and maintenance issues. In this article lets see how we can handle this rapidly changing dimension issue using SCD Type 4.

SCD Type 6 Implementation using Informatica PowerCenter

In one of our prior articles we described the SCD Type 6 dimensional modeling technique. This technique is the combination of SCD Type1, Type 2 and Type 3, which gives much more flexibility in terms of the number of queries it can answer. But off course at the cost of complexity. In this article lets discuss the step by step implementation of SCD Type 6 using Informatica PowerCenter.

Use Informatica Persistent Lookup Cache and Reduce Fact Table Load Time

Use Informatica Persistent Cache and Reduce Fact Table Load Time

In a matured data warehouse environment, it is very common to see fact tables with dozens of dimension tables linked to it. If we are using informatica to build this ETL process, we would expect to see dozens of lookup transformations as well; unless any other design techniques are used. Since lookup is the predominant transformation, turning this will help us gain some performance. Lets see how we can use persistent lookup cache for this performance improvement.

SCD Type 6, a Combination of SCD Type 1, 2 and 3

Slowly Changing Dimension Type 6 a Combination of SCD Type 1, 2 & 3

In couple of our previous articles, we discussed how to design and implement SCD Type1, Type 2 and Type 3. We always can not fulfill all the business requirements just by these basic SCD Types. So here lets see what is SCD Type 6 and what it offers beyond the basic SCD Types.

Re-Keying Surrogate Key For Dimension & Fact Tables. Need, Impact and Fix

A surrogate key is an artificial key that is used as a substitute for a natural key. Every surrogate key points to a dimension record, which represent the state of the dimension record at a point in time. We join between dimension tables and fact tables using surrogate keys to get the factual information at a point in time. In this article lets see the need of surrogate key re-keying, the impact of re-keying and possible fix.

Popular Posts

Random Posts

Posts Being Viewed