Informatica PowerCenter on Grid for Greater Performance and Scalability
Informatica has developed a solution that leverages the power of grid computing for greater data integration scalability and performance. The grid option delivers the load balancing, dynamic partitioning, parallel processing and high availability to ensure optimal scalability, performance and reliability. In this article lets discuss how to setup Infrmatica Workflow to run on grid.
What is PowerCenter On Grid
Performance Improvement Features
When a PowerCenter domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.
Domain : A PowerCenter domain consists of one or more nodes in the grid environment. PowerCenter services run on the nodes. A domain is the foundation for PowerCenter service administration.
Node : A node is a logical representation of a physical machine that runs a PowerCenter service.
Admin Console with Grid ConfigurationBelow shown is an Informatica Admin Console, with two node Grid configuration. We can see two nodes Node_1, Node_2 and the Node_GRID grid created using two nodes. The integration service Int_service_GRID is running on the grid.
Setting up Workflow on Grid
When you setup a workflow to run grid, the Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
You can setup the workflow to run on grid as shown in below image.You can assign the integration service, which is configured on grid to run the workflow on grid.
Setting up Session on Grid
When you run a session on a grid, the Integration Service distributes session threads across nodes in a grid. The Load Balancer distributes session threads to DTM processes running on different nodes. You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.
You can setup the session to run on grid as shown in below image.
Workflow Running on Grid
Below workflow monitor screen shots sows a workflow running on grid. You see two of the session in the workflow wf_Load_CUST_DIM run on Node_1 and other one on Node_1 from 'Task Progress Details' Window.
- Load Balancing : While facing spikes in data processing, load balance guarantees smooth operations by switching the data processing between nodes on the grid. The node is chosen dynamically based on process size, CPU utilization, memory requirements etc...
- High Availability : Grid complements the High Availability feature or PowerCenter by switching the master node in case of a node failure. This ensures the monitoring and the shorten time needed for recovery processes.
- Dynamic Partitioning : Dynamic Partitioning helps making the best use of currently available nodes on the grid. By adapting to available resources, it also helps increasing the performance of the whole ETL process.
Hope you enjoyed this article, please leave us a comment or feedback if you have any, we are happy to hear from you.