Informatica Performance Tuning Guide, Resolve Performance Bottlenecks - Part 3

ETL Admin , Performance Tips

In our previous article in the performance tuning series, we covered different approaches to identify performance bottlenecks. In this article we will cover the methods to resolve different performance bottlenecks. We will talk about session memory, cache memory, source, target and mapping performance turning techniques in detail.

I. Buffer Memory Optimization

When the Integration Service initializes a session, it allocates blocks of memory to hold source and target data. Sessions that use a large number of sources and targets might require additional memory blocks.

Performance Tuning Tutorial Series
Part I : Performance Tuning Introduction.
Part II : Identify Performance Bottlenecks.
Part III : Remove Performance Bottlenecks.
Part IV : Performance Enhancements.

Not having enough buffer memory for DTM process, can slowdown reading, transforming or writing and cause large fluctuations in performance. Adding extra memory blocks can keep the threads busy and improve session performance. You can do this by adjusting the buffer block size and DTM Buffer size.

Note : You can identify DTM buffer bottleneck from Session Log File, Check here for details.

1. Optimizing the Buffer Block Size

Depending on the source, target data, you might need to increase or decrease the buffer block size.

To identify the optimal buffer block size, sum up the precision of individual source and targets columns. The largest precision among all the source and target should be the buffer block size for one row. Ideally, a buffer block should accommodates at least 100 rows at a time.

Buffer Block Size = Largest Row Precision * 100

You can change the buffer block size in the session configuration as shown in below image.

Informatica session DTM buffer block size

2. Increasing DTM Buffer Size

When you increase the DTM buffer memory, the Integration Service creates more buffer blocks, which improves performance. You can identify the required DTM Buffer Size based on below calculation.

Session Buffer Blocks = (total number of sources + total number of targets) * 2
DTM Buffer Size = Session Buffer Blocks * Buffer Block Size / 0.9

You can change the DTM Buffer Size in the session configuration as shown in below image.

II. Caches Memory Optimization

Transformations such as Aggregator, Rank, Lookup uses cache memory to store transformed data, which includes index and data cache. If the allocated cache memory is not large enough to store the data, the Integration Service stores the data in a temporary cache file. Session performance slows each time the Integration Service reads from the temporary cache file.

Note : You can examine the performance counters to determine what all transformations require cache memory turning, Check here for details.

1. Increasing the Cache Sizes

You can increase the allocated cache sizes to process the transformation in cache memory itself such that the integration service do not have to read from the cache file.

You can calculate the memory requirements for a transformation using the Cache Calculator. Below shown is the Cache Calculator for Lookup transformation.

Cache Calculator for informatica LookUP transformation

You can update the cache size in the session property of the transformation as shown below.

informatica transformation cache memory calculation

2. Limiting the Number of Connected Ports

For transformations that use data cache, limit the number of connected input/output and output only ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.

III. Optimizing the Target

The most common performance bottleneck occurs when the Integration Service writes to a target database. Small database checkpoint intervals, small database network packet sizes, or problems during heavy loading operations can cause target bottlenecks.

Note : Target bottleneck can be determined with the help of Session Log File, check here for details.

1. Using Bulk Loads

You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database. When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery.

2. Using External Loaders

To increase session performance, configure PowerCenter to use an external loader for the following types of target databases. External loader can be used for Oracle, DB2, Sybase and Teradata.

3. Dropping Indexes and Key Constraints

When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before you run the session. You can rebuild those indexes and key constraints after the session completes.

4. Minimizing Deadlocks

Encountering deadlocks can slow session performance. You can increase the number of target connection groups in a session to avoid deadlocks. To use a different target connection group for each target in a session, use a different database connection name for each target instance.

5. Increasing Database Checkpoint Intervals

The Integration Service performance slows each time it waits for the database to perform a checkpoint. To decrease the number of checkpoints and increase performance, increase the checkpoint interval in the database.

6. Increasing Database Network Packet Size

If you write to Oracle, Sybase ASE, or Microsoft SQL Server targets, you can improve the performance by increasing the network packet size. Increase the network packet size to allow larger packets of data to cross the network at one time.

IV. Optimizing the Source

Performance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query or small database network packet sizes can cause source bottlenecks.

Note : Session Log File details can be used to identify Source bottleneck, check here for details.

1. Optimizing the Query

If a session joins multiple source tables in one Source Qualifier, you might be able to improve performance by optimizing the query with optimizing hints. Usually, the database optimizer determines the most efficient way to process the source data. However, you might know properties about the source tables that the database optimizer does not. The database administrator can create optimizer hints to tell the database how to execute the query for a particular set of source tables.

2. Increasing Database Network Packet Size

If you read from Oracle, Sybase ASE, or Microsoft SQL Server sources, you can improve the performance by increasing the network packet size. Increase the network packet size to allow larger packets of data to cross the network at one time.

V. Optimizing the Mappings

Mapping-level optimization may take time to implement, but it can significantly boost session performance. Focus on mapping-level optimization after you optimize the targets and sources.

Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. Configure the mapping with the least number of transformations and expressions to do the most amount of work possible. Delete unnecessary links between transformations to minimize the amount of data moved.

Note : You can identify Mapping bottleneck from Session Log File, check here for details.

1. Optimizing Datatype Conversions

You can increase performance by eliminating unnecessary datatype conversions. For example, if a mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions from mappings.

2. Optimizing Expressions

You can also optimize the expressions used in the transformations. When possible, isolate slow expressions and simplify them.

Factoring Out Common Logic : If the mapping performs the same task in multiple places, reduce the number of times the mapping performs the task by moving the task earlier in the mapping.
Minimizing Aggregate Function Calls : When writing expressions, factor out as many aggregate function calls as possible. Each time you use an aggregate function call, the Integration Service must search and group the data. For example SUM(COL_A + COL_B) performs better than SUM(COL_A) + SUM(COL_B)
Replacing Common Expressions with Local Variables : If you use the same expression multiple times in one transformation, you can make that expression a local variable.
Choosing Numeric Versus String Operations : The Integration Service processes numeric operations faster than string operations. For example, if you look up large amounts of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.
Using Operators Instead of Functions : The Integration Service reads expressions written with operators faster than expressions with functions. Where possible, use operators to write expressions.

3. Optimizing Transformations

Each transformation is different and the tuning required for different transformation is different. But generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the transformation.

Note : Tuning technique for different transformation will be covered as a separate article.

What is Next in the Series

The next article in this series will cover the additional features available in Informatica PowerCenter to improve session performance. Hope you enjoyed this article, please leave us a comment or feedback if you have any, we are happy to hear from you.

Popular Posts

Random Posts

Posts Being Viewed

Posts Category