Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery

Johnson Cyriac Jul 16, 2013
|

Informatica PowerCenter Session Partitioning for parallel processing
In addition to a better ETL design, it is obvious to have a session optimized with no bottlenecks to get the best session performance. After optimizing the session performance, we can further improve the performance by exploiting the under utilized hardware power. This refers to parallel processing and we can achieve this in Informatica PowerCenter using Partitioning Sessions.

What is Session Partitioning

Partition Tutorial Series
Part I : Partition Introduction.
Part II : Partition Implementation.
Part III : Dynamic Partition.
The Informatica PowerCenter Partitioning Option increases the performance of PowerCenter through parallel data processing. Partitioning option will let you split the large data set into smaller subsets which can be processed in parallel to get a better session performance.

Partitioning Terminology

Lets understand some partitioning terminology before we get into mode details.
  • Partition : A partition is a subset of the data that executes in a single thread.
  • Number of partitions : We can divide the data set into smaller subset by increasing the number of partitions. When we add partitions, we increase the number of processing threads, which can improve session performance.
  • Stage : Stage is the portion of a pipeline, which is implemented at run time as a thread.
  • Partition Point : This is the boundary between two stages and divide the pipeline into stages. Partition point is always associated with a transformation. 
  • Partition Type : It is an algorithm for distributing data among partitions, which is always associated with a partition point. The partition type controls how the Integration Service distributes data among partitions at partition points.
Below image shows the points we discussed above. We have three partitions and three partition points in below session demo.
Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery

Type of Session Partitions

Different type of partition algorithms are available.
  • Database partitioning : The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.
  • Round-Robin Partitioning  : Using this partitioning algorithm, the Integration service distributes data evenly among all partitions. Use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions.
  • Hash Auto-Keys Partitioning : The PowerCenter Server uses a hash function to group rows of data among partitions. When hash auto-key partition is used, the Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.
  • Hash User-Keys Partitioning : Hash user keys. The Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You choose the ports that define the partition key.
  • Key Range Partitioning : With this type of partitioning, you specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port.
  • Pass-through Partitioning  : In this type of partitioning, the Integration Service passes all rows at one partition point to the next partition point without redistributing them.

Setting Up Session Partitions

Lets see what is required to setup a session with partition enabled.

We can invoke the user interface for session partition as shown in below image from your session using the menu Mapping -> Partitions.
Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery
The interface will let you Add/Modify Partitions, Partition Points and Choose the type of partition Algorithm. Choose any transformation from the mapping and the "Add Partition Point" button will let you add additional partition points.
Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery
Choose any transformation from the mapping and the "Delete Partition Point" or "Edit Partition Point" button will let you modify partition points.
Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery
The "Add/Delete/Edit Partition Point" opens up an additional window which let you modify the partition and choose the type of the partition algorithm as shown in below image.
Informatica PowerCenter Partitioning for Parallel Processing and Faster Delivery

Hope this article is informative and useful for your projects. Please leave your comments and feedback.




About US Contact US Advertise Guest Post Terms and Conditions Privacy Policy Disclaimer Google+

© 2012-2013 Data Intelligence Solution, All Rights Reserved
The contents in this site is copyrighted to Data intelligence Solution and may not be reproduced on other websites.