tag:blogger.com,1999:blog-65935827173635789942024-03-13T21:40:07.222-07:00Data Integration SolutionsDISoln.Org provides tutorials on real time ETL and Data Warehousing scenarios using Informatica PowerCenter through videos, training manual with self explanatory and easy to follow steps. Our tutorials and training provides all the documentation, training manual and scripts required for you to get started yourself and complete the training with enterprise ready experience.Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comBlogger98125tag:blogger.com,1999:blog-6593582717363578994.post-78298705191023334162014-10-05T15:23:00.001-07:002017-02-12T07:33:18.332-08:00An ETL Parameter Framework to Deal with all sorts of Parametrization Needs<img align="left" alt="Informatica Cloud Mapping Tutorial for Beginners" border="0" src="http://3.bp.blogspot.com/-DQyPgKhR3cg/VDAbhdHZZ6I/AAAAAAAAJrg/DdEVtmNQKUk/s1600/Variable.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /> <br />
<div style="text-align: justify;">
We spoke about different <a href="http://www.disoln.org/2012/09/An-ETL-Framework-for-Operational-Metadata-logging.html">etl frameworks</a> in our prior articles. Here in this article lets talk about an ETL framework to deal with parameters we normally use in different ETL jobs and different use cases. Using parametrization in the ETL code increases <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">code reusability</a>, code maintainability and is critical to the quality of the code and reduces the development cycle time.<br />
<a name='more'></a></div>
<div style="text-align: justify;">
</div>
<h2>
Framework Components</h2>
<div>
Our ETL parameter framework will include primarily two components.<br />
<ol>
<ol>
<li style="text-align: justify;"><span style="color: #cc0000;">A Relational Table</span> :- To store the parameter details and parameter values.</li>
<li style="text-align: justify;"><span style="color: #cc0000;">Reusable Mapplet</span> :- <a href="http://www.disoln.org/2013/05/Reuse-Informatica-PowerCenter-Code-Using-Mapplets.html">Mapplet</a> to log the parameter details and values into the relational table.</li>
</ol>
</ol>
<h3>
1. Relational Table</h3>
<div style="text-align: justify;">
A relation table will be used to store the <span style="text-align: left;">parameter details </span>with the below structure. This will store the parameter name, value and the other information relevant to identify the context of the parameter, like folder name, workflow name and session name. </div>
</div>
<ul><ul>
<li>ETL_PARM_ID : A unique sequence number.</li>
<li>FOLDER_NAME : Folder name, in which the parameter is used.</li>
<li>WRKFLW_NAME : Workflow name, in which the parameter is used.</li>
<li>SESSN_NAME : Session name, in which the parameter is used.</li>
<li>PARM_NAME : Name of the parameter</li>
<li>PARM_VAL : Value of the parameter.</li>
<li>ETL_CRT_DATE : Record create timestamp.</li>
<li>ETL_UPD_DATE : Record update timestamp.</li>
</ul>
</ul>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Note</span></b> : You can add repository name to the the table, if the framework is planned to use for workflows running in multiple repositories.<br />
<b><span style="color: #cc0000;">Note</span></b> : All parameter should be stored into the parameter table with its initial value to start with.</div>
<h3>
2. Reusable Mapplet </h3>
<div style="text-align: justify;">
A <a href="http://www.disoln.org/2013/05/Reuse-Informatica-PowerCenter-Code-Using-Mapplets.html" style="text-align: left;">mapplet</a> to capture and load the parameter values into the database table. This mapplet takes two input values and gives all the data elements required in the parameter table mentioned above.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://lh6.ggpht.com/-unqIkuHWEW8/VDC7NXq4LmI/AAAAAAAAJrw/uMQnKPxHA3E/s1600-h/image12.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://lh5.ggpht.com/-QWeko5cZEsk/VDC7OJ7EAmI/AAAAAAAAJr4/MElEdlI5n1w/image_thumb10.png?imgmax=800" /></a></div>
<br />
<div>
<div style="text-align: justify;">
<span style="color: #cc0000;">Mapplet Input</span> : Parameter name, parameter value.</div>
</div>
<div>
<div style="text-align: justify;">
<span style="color: #cc0000;">Mapplet Output</span> : All the data elements required to be stored in the parameter table mentioned above. This output can be connected to the target table to store the information into the relational table.</div>
<h2 style="text-align: justify;">
Framework Implementation in a Workflow</h2>
<div style="text-align: justify;">
This framework can be implemented for both dynamically changing parameters as well as rarely changing or static parameters.</div>
<h3 style="text-align: justify;">
Dynamically Changing Parameters</h3>
<div style="text-align: justify;">
Typical example of dynamically changing parameter is "ETL Run Timestamp" which is used for <a href="http://www.disoln.org/2012/10/change-data-capture-cdc-made-easy-using-mapping-variables.html">incremental data extraction</a> logic. Lets see how incremental data extraction is implemented using this parameter framework.</div>
<br />
Create a mapping variable with MAX aggregation. This variable will hold the parameter value.<br />
<ol>
</ol>
<a href="http://lh4.ggpht.com/-5pXBkLk6OfU/VDDRMVXqIGI/AAAAAAAAJsI/5Xom1BABljU/s1600-h/image%25255B18%25255D.png"><img alt="An ETL Framework for Parameterization" border="0" src="http://lh5.ggpht.com/-eb16-L3ehSA/VDDRNDvIHpI/AAAAAAAAJsQ/kZyNMPNvXR8/image_thumb%25255B19%25255D.png?imgmax=800" height="39" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="450" /></a><br />
<span style="color: #cc0000;"><b>Note</b></span> : Reset the mapping variable in the workflow using the pre-session variable assignment.<br />
<br />
<div style="text-align: justify;">
Set the mapping variable using the SETVARIABLE function in an expression as shown in below image. This will update the mapping variable to the greatest ETL_UPD_DATE value, which will finally be stored into the parameter table using the mapplet.</div>
<br />
<a href="http://lh5.ggpht.com/-q2WtnVAQAUw/VDDRN61FC0I/AAAAAAAAJsY/FShUfdTPd70/s1600-h/image%25255B37%25255D.png"><img alt="An ETL Framework for Parameterization" border="0" src="http://lh5.ggpht.com/-1NWL6XSNGtY/VDDROiBmj4I/AAAAAAAAJsg/ON-ME_Oyy2E/image_thumb%25255B34%25255D.png?imgmax=800" height="244" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a><br />
<div style="text-align: justify;">
Adjust the source filter to pull incremental data. Incremental data is pulled from the source based on ETL_UPD_DATE as shown in below image.</div>
<a href="http://lh4.ggpht.com/-fQmUWJRQDeo/VDDRPd224JI/AAAAAAAAJso/u6DTMAmd6pQ/s1600-h/image%25255B36%25255D.png"><img alt="An ETL Framework for Parameterization" border="0" src="http://lh3.ggpht.com/-OvQEIzRkDrE/VDDRQC7aemI/AAAAAAAAJsw/LRJz0gvbwmk/image_thumb%25255B33%25255D.png?imgmax=800" height="318" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a><br />
<div style="text-align: justify;">
Above mapping configuration will make sure the correct parameter is used and will set the correct parameter value, which is to be stored into the parameter table.<br />
<br />
Add an additional mapping pipeline as shown in below image to store the parameter value into the parameter table. This pipeline will update the current value in the parameter table to the latest value. The mapplet used will make sure the correct parameter and parameter value is updated in the parameter table.</div>
</div>
<a href="http://lh4.ggpht.com/-_tSYeAgbJZs/VDD0jga5PyI/AAAAAAAAJuA/W0RI3tQm3Qo/s1600-h/image%25255B105%25255D.png"><img alt="image" border="0" src="http://lh3.ggpht.com/-3Y4cImP7hyU/VDD0kqhUQrI/AAAAAAAAJuI/g4qMXC7fdL8/image_thumb%25255B88%25255D.png?imgmax=800" height="160" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="675" /></a>
<span style="color: #cc0000;"><b>Note</b></span> : Set the target load order of the new pipeline to the last one in the mapping. Source qualifier of this pipeline will generate one record using "select 'x' from dual" SQL.<br />
<br />
Below shown is the complete mapping design.<br />
<a href="http://lh6.ggpht.com/-iRe-0-xsUJg/VDDnANjVnCI/AAAAAAAAJtY/wnF6ZuB5QoQ/s1600-h/image%25255B78%25255D.png"><img alt="An ETL Framework for Parameterization" border="0" src="http://lh5.ggpht.com/-FrqPUwt-N5g/VDDnAwvtQ1I/AAAAAAAAJtg/7WyTT2P1BkM/image_thumb%25255B67%25255D.png?imgmax=800" height="300" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a><br />
<h3 style="text-align: justify;">
Static or Rarely Changing Parameters</h3>
<div style="text-align: justify;">
Parameters, which might need occasional changes or static parameter can be stored in the parameter table and can be retrieved in the Informatica mapping using a LookUp transformation. Any changes require for the parameter value should be one time updated done outside of the ETL process.<br />
<br />
Below shown is the lookup transformation, which can be used to retrieve parameter value. You just need to pass in the input parameters to the lookup and get the parameter value from the parameter table.</div>
<a href="http://lh5.ggpht.com/-bhUcBA_PHyI/VDDnBi9CE8I/AAAAAAAAJto/rMrtfubEF2U/s1600-h/image%25255B96%25255D.png"><img alt="An ETL Framework for Parameterization" border="0" src="http://lh3.ggpht.com/-DAk1tpC1VWA/VDDnCt0wTwI/AAAAAAAAJtw/L9BjYcSFiHA/image_thumb%25255B81%25255D.png?imgmax=800" height="293" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a><br />
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>Note</b></span> : The static parameter value should already be saved <span style="text-align: justify;">into the parameter table with its static value, before it can be used in a mapping.</span><br />
<h3>
<span style="text-align: justify;">How Parameter Data is Stored in the Parameter table</span></h3>
<div>
<span style="text-align: justify;">As discussed, the parameter framework support both static and dynamic parameters. Lets consider a sample data for the explanation.</span></div>
<div>
<span style="text-align: justify;"><br /></span></div>
</div>
<table align="center" border="1" cellpadding="10" cellspacing="0" style="background-color: white; border-collapse: collapse; border-spacing: 0px; color: #444444; font-family: 'Lucida Grande', 'Lucida Sans Unicode', Helvetica, Arial, sans-serif; font-size: 12px; line-height: 18.375px; text-align: center;"><tbody>
<tr bgcolor="#e06666" style="padding: 7px;"><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">ETL_PARM_ID</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">FOLDER_NAME</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">WRKFLW_NAME</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">SESSN_NAME</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">PARM_NAME</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">PARM_VAL</span></b></td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">ALL</td><td style="margin: 0px; padding: 7px;">ALL</td><td style="margin: 0px; padding: 7px;">ALL</td><td style="margin: 0px; padding: 7px;">YR_BEGIN</td><td style="margin: 0px; padding: 7px;">01-JAN-2014</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2</td><td style="margin: 0px; padding: 7px;">DW_SALES</td><td style="margin: 0px; padding: 7px;">ALL</td><td style="margin: 0px; padding: 7px;">ALL</td><td style="margin: 0px; padding: 7px;">REGION_NAME</td><td style="margin: 0px; padding: 7px;">USA</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">3</td><td style="margin: 0px; padding: 7px;">DW_SALES</td><td style="margin: 0px; padding: 7px;">wf_LOAD_CUST_DIM</td><td style="margin: 0px; padding: 7px;">s_LOAD_CUST_DIM</td><td style="margin: 0px; padding: 7px;">LST_RUN_TS</td><td style="margin: 0px; padding: 7px;">10-OCT-2014</td></tr>
</tbody></table>
<br />
<div style="text-align: justify;">
Parameter IDs 1 and 2 are static parameters. First parameter is defined to used across all folders, workflow and sessions. Second parameter is still a static one, but specific to all workflows and sessions in the folder DW_SALES. Third parameter is dynamic parameter specific to the session s_LOAD_CUST_DIM, which is running in DW_SALES folder.</div>
<h2>
<span style="text-align: justify;">Better than Informatica Parameters and Variables</span></h2>
<div style="text-align: justify;">
Since the parameter framework stores the values outside Informatica environment, you get much more flexibility with it.</div>
<div>
<ul>
<li>Prevents any accidentally parameter value changes, which might happen for mapping variables during code migration.</li>
<li>Centralized storage for all parameter values rather than the storing it in different parameter files or mapping variables.</li>
<li>Easy to update or change the parameter value, unlike it is with mapping variables. When using it with incremental data extraction logic, it is to update the parameter value to reprocess same data set and enable <a href="http://www.disoln.org/2013/02/Restartability-Design-for-Different-Type-ETL-Loads.html">restartability</a>.</li>
<li>Dynamic changing parameters can be handled in the framework. Mapping variables can have only MAX or MIN operations to handle dynamically changing parameters.</li>
<li>Parameter framework can handle both static and dynamic parameters.</li>
<li>More secure than storing the parameters in a parameter file.</li>
</ul>
<div style="text-align: justify;">
Please leave us a comment below, if you have any other thoughts or scenarios to be covered. We will be more than happy to help you.</div>
</div>
<!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2Flh5.ggpht.com%2F-QWeko5cZEsk%2FVDC7OJ7EAmI%2FAAAAAAAAJr4%2FMElEdlI5n1w%2Fimage_thumb10.png%3Fimgmax%3D800&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://lh5.ggpht.com/-QWeko5cZEsk/VDC7OJ7EAmI/AAAAAAAAJr4/MElEdlI5n1w/image_thumb10.png?imgmax=800" --><div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-11508836123547372732014-09-16T23:34:00.000-07:002014-09-16T23:34:05.141-07:00Dynamic Transformation Port Linking Rules in Infromatica Cloud Designer<img align="left" alt="Informatica Cloud Mapping Tutorial for Beginners" border="0" src="http://vital.ai/common-files/icons/Map@2x.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div>
<div style="text-align: justify;">
One of the coolest features which was missing in Informatica PowerCenter was the capability to dynamically link ports between transformations. Many other ETL tools has already been providing this features in there tools. With Informatica Cloud Designer, you can build mapping, with dynamic rules to connect ports between transformations.</div>
<a name='more'></a><h2>
What is Dynamic Field Linking</h2>
<div>
<div style="text-align: justify;">
In the normal PowerCenter mapping, you need to explicitly map the ports to get connected form one transformation to other transformation in the pipeline. But in the Cloud Designer, you can define the rule to dynamically link ports between transformations in the data pipeline. Based on the rules defined, the ports are connected or dropped out between transformations. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This feature provide much flexibility and code reusability from the developer and administrator perspective. We will see the business use case in the further sections.</div>
</div>
<div>
<h2>
Field Rules and Type of Rules</h2>
</div>
<div style="text-align: justify;">
Field rules define how data enters a transformation from the upstream transformation. By default, a transformation inherits all incoming fields from the upstream transformation. All transformations except Source transformations include field rule configuration. When you configure more than one field rule, the Mapping Configuration application evaluates the field rules in the specified order. Use the Actions menu to change the order of rules and delete rules.</div>
<div style="text-align: justify;">
<br /></div>
The following image shows the field rules configured for the transformation. Base on the rules you choose, you can see the ports included and excluded to the transformation.<br />
<div style="text-align: center;">
<a href="http://lh3.ggpht.com/-y_YZ1jnnD_Q/VBke1cGQ7sI/AAAAAAAAJpI/8KwuvN9s0BA/s1600-h/image%25255B13%25255D.png"><img alt="image" src="http://lh3.ggpht.com/-M_SHlT4afPo/VBke1wDSpfI/AAAAAAAAJpQ/W1nT1f6-qM0/image_thumb%25255B11%25255D.png?imgmax=800" height="310" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="650" /></a></div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">All Fields</span></b> :- All Fields rule, includes or excludes all fields from one transformations to downstream transformation. Using the rename option, you can rename the port from one transformation to the other.</div>
</div>
<a href="http://lh3.ggpht.com/-ZYAs9BwOkRs/VBke2dG8_5I/AAAAAAAAJpU/IODGvefW7IU/s1600-h/image%25255B29%25255D.png"><img alt="image" src="http://lh6.ggpht.com/-HTMYMP57f0Y/VBke226xpsI/AAAAAAAAJpc/oN7vy8peB8U/image_thumb%25255B25%25255D.png?imgmax=800" height="99" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="650" /></a><br />
<div>
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>Fields by Data Type</b> </span>:- Includes or excludes ports of selected data types from one transformations to downstream transformation. In the Include/Exclude Fields by Data Type dialog box, you can select the data types that you want to include or exclude. If you want to rename the ports, you can do it by choosing the Rename tab.</div>
<a href="http://lh6.ggpht.com/-Yu53gIvpFMs/VBke3ZTvL0I/AAAAAAAAJpo/Za8ZChJoxb4/s1600-h/image%25255B40%25255D.png"><img alt="image" src="http://lh5.ggpht.com/-xIPAD8Nu3Yg/VBke32TsHII/AAAAAAAAJps/QUw7AHPSWtw/image_thumb%25255B34%25255D.png?imgmax=800" height="99" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="650" /></a>You click on the Configure button to get the below window and choose the port data type, which is required to be passed on to the downstream transformation.<br />
<a href="http://lh6.ggpht.com/-jGFTv1A2ZRY/VBke4f0r7LI/AAAAAAAAJp4/MrwBr_rsJ84/s1600-h/image%25255B52%25255D.png"><img alt="image" src="http://lh6.ggpht.com/-D-O42WSdso4/VBke4_NRDiI/AAAAAAAAJqA/4VQTdWbormk/image_thumb%25255B44%25255D.png?imgmax=800" height="146" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="400" /></a><br />
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Fields by Text or Pattern</span></b> :- Includes or excludes fields by prefix, suffix, or pattern. You can use this option to select fields that you renamed earlier in the data flow. On the Select Fields tab, you can select prefix, suffix, or pattern, and define the rule to use. When you select the prefix option or suffix option, you enter the text to use as the prefix or suffix. When you select pattern, you can enter a regular expression.</div>
<a href="http://lh3.ggpht.com/-zdn6GfmMl3M/VBke5c38yUI/AAAAAAAAJqI/aI9cAhCMXkM/s1600-h/image%25255B63%25255D.png"><img alt="image" src="http://lh4.ggpht.com/-T8L2Dr1DQQc/VBke55Kbv0I/AAAAAAAAJqM/m0wQYihCwC0/image_thumb%25255B53%25255D.png?imgmax=800" height="96" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="650" /></a><br />
You click on the Configure button to get the below window and choose the port name pattern, which is required to be passed on to the downstream transformation.<br />
<a href="http://lh4.ggpht.com/-JEU2u-99EZs/VBke6UY1FCI/AAAAAAAAJqU/BDmtS-8Arwo/s1600-h/image%25255B73%25255D.png"><img alt="image" src="http://lh4.ggpht.com/-BLjCIb9Ned0/VBke675X3OI/AAAAAAAAJqc/vsmgTwywjzs/image_thumb%25255B61%25255D.png?imgmax=800" height="145" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="400" /></a><br />
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Named Fields</span></b> :- Includes or excludes the selected fields. Opens the Include/Exclude Named Fields dialog box. On the Select Fields tab, you can review all incoming fields for selection. On the Rename Selected tab, you can rename selected fields individually or in bulk.</div>
<br />
<div>
<a href="http://lh6.ggpht.com/-ncDrDc89cDE/VBke8Ue_v3I/AAAAAAAAJqo/_YsBRTQAzag/s1600-h/image%25255B94%25255D.png"><img alt="image" src="http://lh6.ggpht.com/-kahJCj7jEQ0/VBke850JRcI/AAAAAAAAJqw/FDitK61s_f8/image_thumb%25255B78%25255D.png?imgmax=800" height="99" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="650" /></a><br />
You click on the Configure button to get the below window and choose the port, which is required to be passed on to the downstream transformation.<br />
<a href="http://lh6.ggpht.com/-mK5oCiGWhnY/VBke9YRfyCI/AAAAAAAAJq4/sDu1hQ8Sjp8/s1600-h/image%25255B105%25255D.png"><img alt="image" src="http://lh5.ggpht.com/-30_IGJn4zy0/VBke-ETf0YI/AAAAAAAAJq8/rbRH8PYo0RQ/image_thumb%25255B87%25255D.png?imgmax=800" height="312" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="400" /></a>
<br />
<h2>
Pros and Cons</h2>
<div style="text-align: justify;">
All approaches has its own benefits and drawbacks. Here is what we see as the good and bad of dynamic column mapping.</div>
<h3>
Pros</h3>
<ul>
<li style="text-align: justify;">Better code reusability, You build the mapping once and you can reuse the code for multiple data sources.</li>
<li style="text-align: justify;">Better flexibility and scalability for development, by providing parametrization and reusability.</li>
<li style="text-align: justify;">Reduce the number of objects to be maintained in the PowerCenter Repository.</li>
</ul>
<h3 style="text-align: justify;">
Cons</h3>
<ul>
<li style="text-align: justify;">Loses Metadata about column mapping, hence the data lineage can not be produced.</li>
<li style="text-align: justify;">Dynamically including all column might lead to processing unwanted columns in the mapping pipeline.</li>
</ul>
<h2>
Business Use Case</h2>
</div>
</div>
<div style="text-align: justify;">
One of the typical use case would be to build stage table loading mapping building. Since a typical stage table mapping will not include not unique complex transformations, you can create just one mapping and can parametrize the source table, target table, connection details etc. This makes the development effort simple and highly reusable.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Hope you enjoyed this article. Please let us your feedback and questions in the comment section below.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-34491973602800435822014-09-06T17:53:00.000-07:002014-09-07T08:13:08.661-07:00Informatica Cloud Mapping Tutorial for Beginners, Building the First Mapping<img align="left" alt="Informatica Cloud Mapping Tutorial for Beginners" border="0" src="http://3.bp.blogspot.com/-jzHCzSD73Mg/VAuoAs38YgI/AAAAAAAAJoU/xT30AY70ius/s1600/clouddesigner.jpeg" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
In the last couple of articles we discussed the basics of <a href="http://www.disoln.org/2014/05/informatica-cloud-for-dummies-What-is-Informatica-Cloud.html">Informatica Cloud</a> and <a href="http://www.disoln.org/2014/05/Informatica-Cloud-Designer-for-Advanced-Data-Integration-On-the-Cloud.html">Informatica Cloud Designer</a>. In this tutorial we describe how to create a basic mapping, save and validate the mapping, and create a mapping configuration task. The demo mapping reads and writes data sources, also include the parameterization technique.<br />
<a name='more'></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The mapping we create here reads source data, filters out unwanted data, and writes data to the target. The mapping also includes parameters for the source connection and filter value. For this tutorial, you can use a sample Account source file available in the Informatica Cloud Community. You can download the sample source file from the following link <a href="https://community.informatica.com/docs/DOC-3800">Sample Source File for the Mapping Tutorial</a>.</div>
<h2>
Step 1. Mapping Creation and Source Configuration</h2>
<div style="text-align: justify;">
The following procedure describes how to create a new mapping and configure the sample Account flat file as the source.</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<ol>
<li style="text-align: justify;">To create a mapping, click <span style="color: #cc0000;">Design > Mappings > New Mapping</span>. </li>
<a href="http://lh3.ggpht.com/-TsL3OqxHwg0/VAlI_naP7kI/AAAAAAAAJh4/1hBPS0B0yy0/s1600-h/image%25255B11%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh3.ggpht.com/-Um3sDaH7IIg/VAlJAAa12UI/AAAAAAAAJh8/xCBlTyjcEQ4/image_thumb%25255B9%25255D.png?imgmax=800" height="127" style="display: block; float: none; margin: 10px auto 5px;" title="" width="450" /></a>
<li style="text-align: justify;">In the <span style="color: #cc0000;">New Mapping dialog</span> box, enter a name for the mapping: Account_by_State. You can use underscores in mapping and transformation names, but do not use other special characters. </li>
<a href="http://lh6.ggpht.com/-QWqNapL-XV4/VAlJAtiqJiI/AAAAAAAAJiI/vyFvUcZ18Zk/s1600-h/image%25255B20%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh3.ggpht.com/-S__8pGoBTVo/VAlJBJA1TTI/AAAAAAAAJiM/gF9ZipOTR3A/image_thumb%25255B16%25255D.png?imgmax=800" height="190" style="display: block; float: none; margin: 10px auto 5px;" title="" width="400" /></a>
<li style="text-align: justify;">To add a source to the mapping, on the <span style="color: #cc0000;">Transformation Palette</span>, click <span style="color: #cc0000;">Source</span>. </li>
<a href="http://lh3.ggpht.com/-leCFoK1ftew/VAlJBpqPCQI/AAAAAAAAJiU/grwQ0eg96hU/s1600-h/image%25255B30%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh4.ggpht.com/-QZtVoDA6A18/VAlJBxjTz1I/AAAAAAAAJig/uLtC0ieaxDM/image_thumb%25255B28%25255D.png?imgmax=800" height="320" style="display: block; float: none; margin: 10px auto 5px;" title="" width="100" /></a>
<li>In the <span style="color: #cc0000;">Properties Panel</span>, on the <span style="color: #cc0000;">General </span>tab, enter a name for the source: FF_Account. <div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-mOOSWhnPMrg/VAlJCzrFFYI/AAAAAAAAJi0/x3IKfom7QOs/w1400-h318-no/image_thumb%5B41%5D" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-mOOSWhnPMrg/VAlJCzrFFYI/AAAAAAAAJi0/x3IKfom7QOs/w1400-h318-no/image_thumb%5B41%5D" height="90" width="400" /></a></div>
</li>
<li style="text-align: justify;">On the <span style="color: #cc0000;">Source</span> tab, configure the following properties:</li>
<ul>
<li style="text-align: justify;"><i>Connection</i> :- Source connection. Select the flat file connection for the sample Account source file. Or, create a new flat file connection for the sample source file.</li>
<li style="text-align: justify;"><i>Source Type </i>:- Source type. Select Object.</li>
<li style="text-align: justify;"><i>Object </i>:- Source object. Select the sample Account source file. To preview source data, click Preview Data.<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-DVfw7_VTQ1U/VAlLB1r2gwI/AAAAAAAAJjE/-IpdwoYhqaM/w1500-h392-no/image_thumb%5B47%5D"><img border="0" src="http://4.bp.blogspot.com/-DVfw7_VTQ1U/VAlLB1r2gwI/AAAAAAAAJjE/-IpdwoYhqaM/w1500-h392-no/image_thumb%5B47%5D" height="166" width="600" /></a></div>
</li>
</ul>
<li style="text-align: justify;">To view source fields and field metadata, click the Fields tab.</li>
<a href="http://lh6.ggpht.com/-D9lGLjq07t8/VAlLCY4ngKI/AAAAAAAAJjI/CTqyejz9grs/s1600-h/image%25255B65%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh6.ggpht.com/-Loxbdv8ABNw/VAlLC3rpH7I/AAAAAAAAJjQ/h4kKBmDQvEs/image_thumb%25255B55%25255D.png?imgmax=800" height="217" style="display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="600" /></a>
<li style="text-align: justify;">To save the mapping and continue, on the toolbar, click <span style="color: #cc0000;">Save > Save and Continue</span>.</li>
</ol>
<h2>
Step 2. Filter Creation and Field Rule Configuration</h2>
<div>
<div style="text-align: justify;">
In the following procedure, you add a Filter transformation to the data flow and define a parameter for the value in the filter condition. When you use a parameter for the value of the filter condition, you can define the filter value that you want to use when you configure the task. And you can create a different task for the data for each state.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The sample Account source file includes a State field. When you use the State field in the filter condition, you can write data to the target by state. For example, when you use State = MD as the condition, you include accounts based in Maryland in the data flow.</div>
<div>
<ol>
<li style="text-align: justify;">To add a Filter transformation, on the <span style="color: #cc0000;">Transformation palette</span>, drag a <span style="color: #cc0000;">Filter</span> transformation to the mapping canvas. </li>
<li style="text-align: justify;">To link the Filter transformation to the data flow, draw a link from the <span style="color: #cc0000;">FF_Account source</span> to the <span style="color: #cc0000;">Filter transformation</span>. When you link transformations, the downstream transformation inherits fields from the previous transformation. </li>
<li style="text-align: justify;">To configure the Filter transformation, select the Filter transformation on the mapping canvas. </li>
<li style="text-align: justify;">To name the Filter transformation, in the <span style="color: #cc0000;">Properties panel</span>, click <span style="color: #cc0000;">General</span> and enter the name: Filter_by_State. </li>
<a href="http://lh4.ggpht.com/-_lf_DzEhVS0/VAqHq2BK4iI/AAAAAAAAJj0/P_RWNvrBMUA/s1600-h/image%25255B16%25255D.png"><img alt="image" src="http://lh6.ggpht.com/-kynLhu39W0U/VAqHrkf1sdI/AAAAAAAAJj4/fidtmMil4Fo/image_thumb%25255B13%25255D.png?imgmax=800" height="155" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="450" /></a>
<li style="text-align: justify;">To configure field rules, click <span style="color: #cc0000;">Incoming</span> Fields. Field rules define the fields that enter the transformation and how they are named. By default, all available fields are included in the transformation. Since we want to use all fields, do not configure additional field rules. </li>
<a href="http://lh3.ggpht.com/-i073SZWaT2E/VAqHsPnUTdI/AAAAAAAAJkE/QoLz9PQ4YCw/s1600-h/image%25255B23%25255D.png"><img alt="image" src="http://lh4.ggpht.com/-SFKXMkuwpaM/VAqHsgQLnxI/AAAAAAAAJkM/bli34Ad_ukI/image_thumb%25255B18%25255D.png?imgmax=800" height="174" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="640" /></a>
<li style="text-align: justify;">To configure the filter condition, click <span style="color: #cc0000;">Filter</span>. </li>
<ol>
<li style="text-align: justify;">To create a simple filter with a parameter for the value, for <span style="color: #cc0000;">Filter Condition</span>, select <span style="color: #cc0000;">Simple</span>. </li>
<li style="text-align: justify;">Click <span style="color: #cc0000;">Add New Filter Condition</span>. </li>
<li style="text-align: justify;">For <span style="color: #cc0000;">Field Name</span>, select <span style="color: #cc0000;">State</span>, and use Equals as the operator. </li>
<li style="text-align: justify;">For <span style="color: #cc0000;">Value</span>, select <span style="color: #cc0000;">New Parameter</span>. </li>
<a href="http://lh3.ggpht.com/-i073SZWaT2E/VAqHsPnUTdI/AAAAAAAAJkE/QoLz9PQ4YCw/s1600-h/image%25255B23%25255D.png"><img alt="image" src="http://lh4.ggpht.com/-SFKXMkuwpaM/VAqHsgQLnxI/AAAAAAAAJkM/bli34Ad_ukI/image_thumb%25255B18%25255D.png?imgmax=800" height="174" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="640" /></a>
<li style="text-align: justify;">In the <span style="color: #cc0000;">New Parameter</span> dialog box, configure the following options and click <span style="color: #cc0000;">OK</span>. </li>
<ol>
<li style="text-align: justify;">Name: FConditionValue </li>
<li style="text-align: justify;">Display Label: Filter Value for State </li>
<li style="text-align: justify;">Description: Enter the two-character state name for the data you want to use. </li>
<li style="text-align: justify;">Default Value: MD. Notice, you can only create a string parameter in this location. </li>
<a href="http://lh6.ggpht.com/-Q-EdTqUe41A/VAqHuZZqo-I/AAAAAAAAJkg/cn_7r9vrpwE/s1600-h/image%25255B43%25255D.png"><img alt="image" src="http://lh3.ggpht.com/-R99TKMAJIZ8/VAqHvJQgtdI/AAAAAAAAJks/3mTd7Cg0mug/image_thumb%25255B34%25255D.png?imgmax=800" height="294" style="display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="350" /></a>
</ol>
</ol>
<li style="text-align: justify;">To save your changes, click <span style="color: #cc0000;">Save > Save and Continue</span>.</li>
</ol>
<h2>
Step 3. Target and Source Parameter Configuration</h2>
<div style="text-align: justify;">
In the following procedure, you configure the target, then replace the source connection with a parameter.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Because you plan to <span style="color: #cc0000;">parametrize the source</span>, you also need to use a parameter for the field mapping.</div>
<ol>
<li style="text-align: justify;">To add a Target transformation, on the <span style="color: #cc0000;">Transformation palette</span>, drag a <span style="color: #cc0000;">Target</span> transformation to the mapping canvas. </li>
<li style="text-align: justify;">To link the <span style="color: #cc0000;">Target</span> transformation to the data flow, draw a link from the <span style="color: #cc0000;">Filter</span> transformation to the Target transformation. </li>
<li style="text-align: justify;">Click the <span style="color: #cc0000;">Target tab</span> and configure the following properties:</li>
<ol>
<li>Connection :- Target connection. Select a connection for the target. Or, create a new connection to the target. Target Type :- Target type. Select Object. </li>
<li>Object :- Target object. Select an appropriate target. </li>
<li>Operation :- Target operation. Select Insert.</li>
<a href="http://lh6.ggpht.com/-kvETNlLk3OA/VAtAwDyiSQI/AAAAAAAAJlI/C9orRdrxpkk/s1600-h/image%25255B50%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh5.ggpht.com/--4qhDM36tGQ/VAtAwoy3mLI/AAAAAAAAJlQ/Y-22JlA2i0A/image_thumb%25255B39%25255D.png?imgmax=800" height="285" style="display: block; float: none; margin: 10px auto 5px;" title="" width="600" /></a>
</ol>
<li>To configure the field mapping, click <span style="color: #cc0000;">Field Mapping</span>. </li>
<li style="text-align: justify;">To map some fields and allow the remaining fields to be mapped in the task, configure the Field Map Option for <span style="color: #cc0000;">Partially Parametrized</span>. </li>
<a href="http://lh5.ggpht.com/-UNP54aXtFeg/VAurCqbTsPI/AAAAAAAAJoc/eic63kdf0w4/s1600-h/image%25255B174%25255D.png"><img alt="image" src="http://lh6.ggpht.com/-CeXZe_hYz98/VAurDCvCrMI/AAAAAAAAJog/lAjMUT-76mY/image_thumb%25255B144%25255D.png?imgmax=800" height="179" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="600" /></a>
<li style="text-align: justify;">Create a <span style="color: #cc0000;">New Parameter </span>and configure the following properties.</li>
<ol>
<li style="text-align: justify;">Name: PartialFieldMapping. </li>
<li style="text-align: justify;">Display Label: Partial Field Mapping. </li>
<li style="text-align: justify;">Select Allow partial mapping override. This allows you to view and edit mapped fields in the task. When want to prevent the task developer from changing field mappings configured in the mapping, clear this option.</li>
<a href="http://lh5.ggpht.com/-CbrTWb37dg0/VAtAyXgs4BI/AAAAAAAAJlk/6YOKGcEoWTk/s1600-h/image%25255B74%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh5.ggpht.com/-TvjTfKh19bE/VAtAyz9q65I/AAAAAAAAJlw/_wansx1NbDc/image_thumb%25255B59%25255D.png?imgmax=800" height="495" style="display: block; float: none; margin: 10px auto 5px;" title="" width="350" /></a>
</ol>
<li style="text-align: justify;">Map the fields that you want to show as mapped in the task. </li>
<li style="text-align: justify;">Click <span style="color: #cc0000;">Save > Save and Continue</span>. </li>
<li style="text-align: justify;">To edit the source to add a parameter for the source connection, click the FF_Account Source transformation, and then click the Source tab. </li>
<li style="text-align: justify;">For Connection, click <span style="color: #cc0000;">New Parameter</span>. </li>
<li style="text-align: justify;">In the <span style="color: #cc0000;">New Parameter</span> dialog box, configure the following parameter properties.</li>
<ol>
<li style="text-align: justify;">Name: SourceConnection. </li>
<li style="text-align: justify;">Display Label: Sample Flat File. </li>
<li style="text-align: justify;">Description: Select the connection to the sample file.</li>
<a href="http://lh6.ggpht.com/-ehS1o5A1vec/VAtAzchp_OI/AAAAAAAAJl0/gi9nv73jx7g/s1600-h/image%25255B82%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh5.ggpht.com/-yIZbl3scY18/VAtAzgSbwQI/AAAAAAAAJl8/AYW9xn13UWk/image_thumb%25255B65%25255D.png?imgmax=800" height="497" style="display: block; float: none; margin: 10px auto 5px;" title="" width="350" /></a></ol>
</ol>
Below shown is the completed mapping.<br />
<a href="http://lh3.ggpht.com/-JwDwPo3_fO8/VAuH8JmL3JI/AAAAAAAAJmM/wUWwBJOQmiI/s1600-h/image%25255B108%25255D.png"><img alt="image" src="http://lh5.ggpht.com/-PaNEE9zk6j8/VAuH8xrKnYI/AAAAAAAAJmU/s4NzTDfih7U/image_thumb%25255B85%25255D.png?imgmax=800" height="46" style="display: block; float: none; margin: 10px auto -20px;" title="image" width="400" /></a>
<br />
<ol>
</ol>
<h2>
Step 4. Mapping Validation and Task Creation</h2>
</div>
<div>
<div style="text-align: justify;">
In the following procedure, you save and validate the mapping. And you create a mapping configuration task based on the mapping.</div>
<ol>
<li style="text-align: justify;">To validate the mapping, click <span style="color: #cc0000;">Save > Save and Continue</span>. </li>
<ul>
<li style="text-align: justify;">When you save the mapping, the Mapping Designer <span style="color: #cc0000;">validates the mapping</span>. The mapping is valid when the Status in the status area shows Valid. </li>
<li style="text-align: justify;">If the status is Invalid, in the toolbar, click the Validation icon. In the Validation panel, click Validate. </li>
<a href="http://lh3.ggpht.com/-4EbOWTMEYy8/VAuH9eZiULI/AAAAAAAAJmc/BxyFD01Ll1w/s1600-h/image%25255B99%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh6.ggpht.com/-6m-yFZV6uaI/VAuH9weTs4I/AAAAAAAAJmg/Y4okMO_aCVw/image_thumb%25255B78%25255D.png?imgmax=800" height="87" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="200" /></a>
<li style="text-align: justify;">The Validation panel lists the transformations in the mapping and the mapping status. The mapping should be valid. If errors display, correct the errors. Click Validate to verify that errors are corrected. </li>
<a href="http://lh3.ggpht.com/-GMFNNOrEDaw/VAuH-dd-AFI/AAAAAAAAJms/D91jBzRR36Q/s1600-h/image%25255B98%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh6.ggpht.com/-h6iWnw2oPpE/VAuH-7SPyTI/AAAAAAAAJm0/T_5V79HdVK4/image_thumb%25255B77%25255D.png?imgmax=800" height="343" style="display: block; float: none; margin: 10px auto 5px;" title="" width="300" /></a>
</ul>
<li style="text-align: justify;">To create a task based on the mapping, click <span style="color: #cc0000;">Save > Save and New Mapping Configuration Task</span>. The Mapping Configuration Task wizard launches as shown below. </li>
<a href="http://lh5.ggpht.com/-Bwg3ViBqCJ8/VAuh4GAx_xI/AAAAAAAAJnA/WBKC6dahW8U/s1600-h/image%25255B117%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh4.ggpht.com/-hutLEn5-5DA/VAuh4mkK8iI/AAAAAAAAJnI/rGj48klwKVU/image_thumb%25255B92%25255D.png?imgmax=800" height="93" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="450" /></a>
<ol>
</ol>
<li style="text-align: justify;">On the <span style="color: #cc0000;">Definition page</span>, enter a name for the task: Mapping Tutorial and give your <span style="color: #cc0000;">Secure Agent</span>. Notice, the task uses the mapping that you just completed. </li>
<a href="http://lh4.ggpht.com/-PPC04TEIBts/VAuh5FlcSiI/AAAAAAAAJnQ/Q1BPGeVejFA/s1600-h/image%25255B134%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh6.ggpht.com/-RMsCuvsJZN4/VAuh5osw7rI/AAAAAAAAJnU/znf81EYBW4M/image_thumb%25255B105%25255D.png?imgmax=800" height="409" style="display: block; float: none; margin: 10px auto 5px;" title="" width="450" /></a>
<ol>
</ol>
<li style="text-align: justify;"><span style="color: #cc0000;">Click Next</span>. On the Sources page, the source parameter displays. Notice, the tool tip for the connection displays the parameter description. For Sample Flat File, select the source connection to the sample file, and <span style="color: #cc0000;">click Next</span>. </li>
<a href="http://lh4.ggpht.com/-dmZx_Khz9as/VAuh6NhcpYI/AAAAAAAAJnc/uLabUGPhVb4/s1600-h/image%25255B142%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh5.ggpht.com/-anLHiR4S9aQ/VAuh6ehIdAI/AAAAAAAAJnk/348NMksqnqM/image_thumb%25255B111%25255D.png?imgmax=800" height="209" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="500" /></a>
<ul>
<li style="text-align: justify;">Notice, the Targets page does not display because the target connection and object is defined in the mapping. </li>
<li style="text-align: justify;">The Other Parameters page displays the remaining parameters for the mapping. </li>
</ul>
<ol>
</ol>
<li style="text-align: justify;">In the Partial Field Mapping parameter, map the target fields that you want to use. </li>
<a href="http://lh3.ggpht.com/-UGupwm9YtjQ/VAuh6zdcMoI/AAAAAAAAJnw/ESODGggHk4A/s1600-h/image%25255B163%25255D.png"><img alt="Informatica Cloud Mapping Tutorial for Beginners" src="http://lh4.ggpht.com/-u39c5PWPrrc/VAuh7UFvnAI/AAAAAAAAJn0/bcliv8JC4Ck/image_thumb%25255B135%25255D.png?imgmax=800" height="657" style="display: block; float: none; margin: 10px auto 5px;" title="" width="500" /></a>
<ul>
<li style="text-align: justify;">Note that because you allowed partial mapping override, the Target Fields list displays all fields. You can keep or remove the existing links. </li>
</ul>
<ol>
</ol>
<li style="text-align: justify;">For the Filter Value for State parameter, delete the default value, MD, and enter TX. </li>
<a href="http://lh6.ggpht.com/-_DmlPh2PHuc/VAuh7h3AHUI/AAAAAAAAJn8/KDIEE2qMIGo/s1600-h/image%25255B162%25255D.png"><img alt="image" src="http://lh6.ggpht.com/-WNP_DyKNbcg/VAuh8f5lcwI/AAAAAAAAJoI/Qkx508041LU/image_thumb%25255B134%25255D.png?imgmax=800" height="109" style="display: block; float: none; margin: 10px auto 5px;" title="image" width="500" /></a>
<li style="text-align: justify;">To save and close the task, click <span style="color: #cc0000;">Save > Save and Close</span>.</li>
</ol>
<div style="text-align: justify;">
In the next step you can schedule the mapping on a predefined schedule. Hope you guys enjoyed this article. We are curious to know about your feedback.</div>
</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-13034008147897224572014-07-16T22:48:00.000-07:002014-09-06T18:16:54.913-07:00Informatica Incremental Aggregation Implementation and Business Use Cases<img align="left" alt="Informatica PowerCenter Incrimental Aggregation" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbVrLi0puy9kTozo-BqMkn8Jor2JmnSfKBCFESSawKkOGUgu8sukbXZklfFCqX-SUbl7UeDa70yorvgkTPdq_kr5C58hyphenhyphenS515FyIOF825u5MI_DcqxL6gRUo2LWF6TvfVnwSs6Sto4tDc/s1600/zigma.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
Incremental Aggregation is the perfect <a href="http://www.disoln.org/search/label/Performance%20Tips">performance improvement</a> technique to implement; when you have to do aggregate calculations on your incrementally changing source data. Rather than forcing the session to process the entire source data and recalculate the same data each time you run the session, incremental aggregation persist the aggregated value and adds the <a href="http://www.disoln.org/2012/10/An-ETL-Framework-for-Change-Data-Capture-CDC.html">incremental changes</a> to it. Lets see more details in this article.<br />
<a name='more'></a></div>
<h2 style="text-align: justify;">
What is Incremental Aggregation</h2>
<div>
<div class="p1" style="text-align: justify;">
Using incremental aggregation, you can apply changes captured from the source to aggregate calculations such as Sum, Min, Max, Average etc... If the source <a href="http://www.disoln.org/2012/10/change-data-capture-cdc-made-easy-using-mapping-variables.html">changes incrementally</a> and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.</div>
<h2 style="text-align: justify;">
When to Use Incremental Aggregation</h2>
<div>
<div class="p1" style="text-align: justify;">
<span style="color: #cc0000;"><b>You can capture new source data</b> </span>: Use incremental aggregation when you can capture new source data each time you run the session. Use a <a href="http://www.disoln.org/2012/10/An-ETL-Framework-for-Change-Data-Capture-CDC.html">change data capture</a> mechanism for the same.</div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
<span style="color: #cc0000;"><b>Incremental changes do not significantly change the target</b> </span>: Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.</div>
</div>
<h2 style="text-align: justify;">
How Incremental Aggregation Works</h2>
<div>
<div class="p1" style="text-align: justify;">
When the session runs with incremental aggregation enabled for the first time, it uses the entire source data. At the end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file, in the cache directory specified in the Aggregator transformation properties.</div>
<div class="p1" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
</div>
<div class="p1">
</div>
<div class="p1" style="text-align: justify;">
Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding aggregate group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does, the Integration Service creates a new group and saves the record data.</div>
<div class="p1" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
<b><span style="color: #cc0000;">Note</span></b> : Before enabling incremental aggregation, it is important to read <a href="http://www.disoln.org/2012/10/change-data-capture-cdc-made-easy-using-mapping-variables.html">incremental changes</a> from source to avoid double count.</div>
</div>
<h2 style="text-align: justify;">
Business Use Case</h2>
</div>
<div style="text-align: justify;">
Lets consider an ETL job, which is used to load the Sales Summary Table. The summary table generates yearly sales summary by product line. The table includes the columns 'Sales Year', 'Product Line Name', 'Sales Quantity', 'Sales Amount'</div>
<div>
<h2 style="text-align: justify;">
Incremental Aggregation Implementation</h2>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
Lets create a mapping, which can identify the new sales data from the data source and set the incremental aggregation. New sales data records are identified using the CREATE_DT column in the source table. The source qualifies of the mapping looks as in below image. The source qualifier is set to read the <a href="http://www.disoln.org/2012/10/change-data-capture-cdc-made-easy-using-mapping-variables.html">changed data</a> using mapping variables.</div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<a href="http://lh3.ggpht.com/--nJlBuFKGDY/U1Rv8eL5chI/AAAAAAAAJUo/bvN3PP07AOw/s1600-h/image%25255B18%25255D.png"><img alt="Informatica Incremental Aggregation Implementation" border="0" src="http://lh4.ggpht.com/-Bc8wionx4Fc/U1Rv9R161FI/AAAAAAAAJUw/B16OSEK_u9o/image_thumb%25255B14%25255D.png?imgmax=800" height="428" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="563" /></a></div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
Now do the aggregation calculation using the aggregator transformation as shown in below image.</div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="http://lh5.ggpht.com/-_B0AZIsv4iA/U1Rv9_HJVqI/AAAAAAAAJU4/DRyl-iAr6YE/s1600-h/image%25255B21%25255D.png"><img alt="Informatica Incremental Aggregation Implementation" border="0" src="http://lh3.ggpht.com/-nhCuI-wZqzQ/U1Rv-5aLeWI/AAAAAAAAJVA/eYxP7eBq8Tw/image_thumb%25255B17%25255D.png?imgmax=800" height="160" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="371" /></a></div>
</div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
Complete the mapping as shown in below image.</div>
<div style="text-align: justify;">
<a href="http://lh6.ggpht.com/-uqe3M2kNbHc/U1Sw9Nt1icI/AAAAAAAAJWA/UR03SEajCG4/s1600-h/image%25255B94%25255D.png"><img alt="image" border="0" src="http://lh5.ggpht.com/-js3CkMsHTMY/U1Sw9_ZxtSI/AAAAAAAAJWI/Ios_bkDa5V0/image_thumb%25255B80%25255D.png?imgmax=800" height="98" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 5px;" title="image" width="675" /></a></div>
</div>
<div style="text-align: justify;">
Create the Workflow and set the incremental aggregation setting in the session property as shown in the image.</div>
<div>
<a href="http://lh3.ggpht.com/-6EjInMkxlgk/U1RwBTc98OI/AAAAAAAAJVY/69Eo31fhrb4/s1600-h/image%25255B51%25255D.png"><img alt="Informatica Incremental Aggregation Implementation" border="0" src="http://lh6.ggpht.com/-TwwGxfICiaQ/U1RwCTBw3aI/AAAAAAAAJVg/SUMFGeaO5wc/image_thumb%25255B43%25255D.png?imgmax=800" height="489" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="600" /></a><br />
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Note</span></b> : No need to use an update strategy transformation to implement <i><span style="color: #cc0000;">Insert</span></i> else <i><span style="color: #cc0000;">Update</span></i> logic. You can set the session properties just like '<i><span style="color: #cc0000;">Insert</span></i>' only mapping. When you use the incremental aggregation, Integration Service does the <i><span style="color: #cc0000;">Insert</span></i> or <i><span style="color: #cc0000;">Update</span></i> based on the primary key set in the target table.<br />
<h2>
Incremental Aggregation Behind the Scene</h2>
<div>
Lets understand how incremental aggregator works behind the scene. For the better understanding lets use the data set from the use case explained above. </div>
</div>
<h3>
Source data from Day I</h3>
<div>
On Day 1, all data from the source is read and processed in the mapping.</div>
<div>
<br /></div>
<table align="left" border="1" cellpadding="10" cellspacing="0" style="background-color: white; border-collapse: collapse; border-spacing: 0px; color: #444444; font-family: 'Lucida Grande', 'Lucida Sans Unicode', Helvetica, Arial, sans-serif; font-size: 12px; line-height: 18.375px; text-align: center;"><tbody>
<tr bgcolor="#e06666" style="padding: 7px;"><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Date</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Product Line</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Quantity</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Amount</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Create Date</span></b></td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">04-Jan-2014</td><td style="margin: 0px; padding: 7px;">Tablet</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$450</td><td style="margin: 0px; padding: 7px;">04-Jan-2014</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">03-Feb-2014</td><td style="margin: 0px; padding: 7px;">Tablet</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$500</td><td style="margin: 0px; padding: 7px;">03-Feb-2014</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">03-Feb-2014</td><td style="margin: 0px; padding: 7px;">Computers</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$1,300</td><td style="margin: 0px; padding: 7px;">03-Feb-2014</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">13-Mar-2014</td><td style="margin: 0px; padding: 7px;">Cell Phone</td><td style="margin: 0px; padding: 7px;">2</td><td style="margin: 0px; padding: 7px;">$350</td><td style="margin: 0px; padding: 7px;">13-Mar-2014</td></tr>
</tbody></table>
<span style="text-align: justify;"><br />
</span> <span style="text-align: justify;"><br />
</span> <span style="text-align: justify;"><br />
</span> <span style="text-align: justify;"><br />
</span> <span style="text-align: justify;"><br />
</span> <span style="text-align: justify;"><br />
</span><br />
<span style="text-align: justify;">Data from the source is read, summarized and persisted in Aggregator Cache. One row per aggregator group is persisted in the cache.</span><br />
<div style="align: right;">
<table align="right" border="1" cellpadding="10" cellspacing="0" style="align: right; background-color: white; border-collapse: collapse; border-spacing: 0px; color: #444444; font-family: 'Lucida Grande', 'Lucida Sans Unicode', Helvetica, Arial, sans-serif; font-size: 12px; line-height: 18.375px; text-align: center;"><tbody>
<tr bgcolor="#ccccff" style="padding: 7px;"><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Year</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Product Line</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Quantity</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Amount</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Note</span></b></td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Tablet</td><td style="margin: 0px; padding: 7px;">2</td><td style="margin: 0px; padding: 7px;">$950</td><td style="margin: 0px; padding: 7px;">New In Cache</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Computers</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$1,300</td><td style="margin: 0px; padding: 7px;">New In Cache</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Cell Phone</td><td style="margin: 0px; padding: 7px;">2</td><td style="margin: 0px; padding: 7px;">$350</td><td style="margin: 0px; padding: 7px;">New In Cache</td></tr>
</tbody></table>
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
</div>
<div style="align: right;">
<h4>
<span style="text-align: justify;"><br />
</span></h4>
<h4>
<span style="text-align: justify;"><br />
</span></h4>
<h4>
</h4>
<h3>
<span style="text-align: justify;"><br />
</span></h3>
<h3>
<span style="text-align: justify;">Source data from Day 2</span></h3>
<div>
<span style="text-align: justify;"><span style="text-align: start;">On Day 2, only new data is read from the source and processed in the mapping.</span></span><br />
<br /></div>
</div>
<div style="align: right;">
<div>
<table align="left" border="1" cellpadding="10" cellspacing="0" style="background-color: white; border-collapse: collapse; border-spacing: 0px; color: #444444; font-family: 'Lucida Grande', 'Lucida Sans Unicode', Helvetica, Arial, sans-serif; font-size: 12px; line-height: 18.375px; text-align: center;"><tbody>
<tr bgcolor="#e06666" style="padding: 7px;"><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Date</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Product Line</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Quantity</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Amount</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Create Date</span></b></td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">14-Mar-2014</td><td style="margin: 0px; padding: 7px;">Tablet</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$450</td><td style="margin: 0px; padding: 7px;">14-Mar-2014</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">14-Mar-2014</td><td style="margin: 0px; padding: 7px;">Tablet</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$500</td><td style="margin: 0px; padding: 7px;">14-Mar-2014</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">14-Mar-2014</td><td style="margin: 0px; padding: 7px;">Video Game</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$300</td><td style="margin: 0px; padding: 7px;">14-Mar-2014</td></tr>
</tbody></table>
</div>
<div>
<br />
<br />
<br />
<br />
<br />
<span style="text-align: justify;"><br />
</span> <span style="text-align: justify;">Aggregator Cache is updated with the new values and new aggregator groups are inserted.</span><br />
<span style="text-align: justify;"><br />
</span></div>
<div>
<table align="right" border="1" cellpadding="10" cellspacing="0" style="background-color: white; border-collapse: collapse; border-spacing: 0px; color: #444444; font-family: 'Lucida Grande', 'Lucida Sans Unicode', Helvetica, Arial, sans-serif; font-size: 12px; line-height: 18.375px; text-align: center;"><tbody>
<tr bgcolor="#ccccff" style="padding: 7px;"><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Year</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Product Line</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Quantity</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Sales Amount</span></b></td><td style="margin: 0px; padding: 7px;"><b><span style="color: black;">Note</span></b></td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Tablet</td><td style="margin: 0px; padding: 7px;">4</td><td style="margin: 0px; padding: 7px;">$1,900</td><td style="margin: 0px; padding: 7px;">Update In Cache</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Computers</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$1,300</td><td style="margin: 0px; padding: 7px;">No Change In Cache</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Cell Phone</td><td style="margin: 0px; padding: 7px;">2</td><td style="margin: 0px; padding: 7px;">$350</td><td style="margin: 0px; padding: 7px;">No Change In Cache</td></tr>
<tr style="padding: 7px;"><td style="margin: 0px; padding: 7px;">2014</td><td style="margin: 0px; padding: 7px;">Video Game</td><td style="margin: 0px; padding: 7px;">1</td><td style="margin: 0px; padding: 7px;">$300</td><td style="margin: 0px; padding: 7px;">New In Cache</td></tr>
</tbody></table>
</div>
</div>
<br />
<h2>
<br />
</h2>
<h2>
<br />
</h2>
<h2>
<br />
</h2>
<div>
<br /></div>
<h2>
Reinitializing the Aggregate Cache Files</h2>
<div style="text-align: justify;">
Based on the use case we discussed here, we need to reset the aggregate cache file for every new year. You can reset the cache file using the settings shown in below image. You get a warning message about clearing the persisted aggregate values, but can be ignored.</div>
<a href="http://lh6.ggpht.com/-XUZsGtiEmXI/U1RwDb5Y1XI/AAAAAAAAJVo/jWqgTPPQCLs/s1600-h/image%25255B64%25255D.png"><img alt="Informatica Incremental Aggregation Implementation" border="0" src="http://lh4.ggpht.com/-4qTeekvr190/U1RwHLRfxcI/AAAAAAAAJVw/JrtmwL3fd4U/image_thumb%25255B54%25255D.png?imgmax=800" height="489" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="600" /></a><br />
<div class="p1" style="text-align: justify;">
After you run a session that reinitializes the aggregate cache, edit the session properties to disable the Reinitialize Aggregate Cache option. If you do not clear Reinitialize Aggregate Cache, the Integration Service overwrites the aggregate cache each time you run the session. </div>
<div class="p1">
<br /></div>
<div style="text-align: justify;">
Hope this article is useful for you guys. Please feel free to share your comments and any questions you may have.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-6670187751987897112014-06-05T22:52:00.000-07:002014-09-04T19:58:20.472-07:00Informatica Cloud Designer for Advanced Data Integration On the Cloud<img align="left" alt="Informatica Cloud Designer for Advanced Data Integration On the Cloud" border="0" height="100" src="https://lh5.googleusercontent.com/-ZjLvPiTdD9o/U4I_jGBWUOI/AAAAAAAAJXQ/-lgPqoBBpjo/h120/visual-design.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="110" /><br />
<div>
<div style="text-align: justify;">
Informatica Cloud is an on-demand subscription service that provides cloud applications. It uses functionality from <a href="http://www.disoln.org/2012/08/Understand-Informatica-PowerCenter-Mapping-Designer.html">Informatica PowerCenter</a> to provide easy to use, web-based applications. Cloud Designer is one of the applications provided by Informatica Cloud. Lets see the features of Informatica Cloud Designer in this article.<br />
<a name='more'></a></div>
<h2>
What is Informatica Cloud Designer</h2>
<div style="text-align: justify;">
Informatica Cloud Designer is the counterpart of <a href="http://www.disoln.org/2012/08/Understand-Informatica-PowerCenter-Mapping-Designer.html">PowerCenter Designer</a> on the cloud. <span style="text-align: justify;">Use Cloud Mapping Designer to configure mappings similar to PowerCenter mappings. When you configure a mapping, you describe the flow of data from source and target. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As it is in PowerCenter Designer you can add transformations to transform data, such as an <a href="http://www.disoln.org/2012/06/retain-values-from-previously-processed.html">Expression transformation</a> for row-level calculations, or <a href="http://www.disoln.org/2012/11/Working-with-Flat-File-Source-LookUp-Filter-Transformation.html">Filter transformation</a> to remove data from the data flow. It additionally support <a href="http://www.disoln.org/2012/11/Unlock-the-JOINER-Transformation-limitations-Using-ACTIVE-LookUP-Transformation.html">Joiner transformation</a> and <a href="http://www.disoln.org/2013/06/Pipeline-Lookups-Beyond-Relational-and-Flat-FIle-Data-Sources.html">LookUp transformation</a>. A transformation includes field rules to define incoming fields. Links visually represent how data moves through the data flow.</div>
<div>
</div>
<h2>
Cloud Designer Interface</h2>
</div>
<div style="text-align: justify;">
Cloud Designer provides a web based user interface similar to what we have for PowerCenter Designer. This interface can be accessed from your <a href="https://app2.informaticacloud.com/saas/app/quickSetup.do">Informatica Cloud Portal</a>.<br />
<br /></div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
Below is a screenshot of Cloud Designer with different mapping designer areas.</div>
<a href="http://lh4.ggpht.com/-p1KobBfHEjM/U4JuD4LyVSI/AAAAAAAAJYQ/0yKiLb44v9Q/s1600-h/CloudDesignerPNG%25255B6%25255D.png"><img alt="Informatica Cloud Designer for Advanced Data Integration On the Cloud" border="0" src="http://lh4.ggpht.com/-H980J-_vG4I/U4JuEXrQlrI/AAAAAAAAJYY/SnPfL8uk_iw/CloudDesignerPNG_thumb%25255B4%25255D.png?imgmax=800" height="525" style="border: 0px; display: block; float: none; margin: 10px 0px -20px 0px;" title="" width="1051" /></a><br />
<ol>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Mapping Canvas</b></span><span style="font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20.53333282470703px;"> :- </span></span>The canvas for configuring a mapping, which is similar the workspace what we have for PowerCenter Designer.</li>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Transformation Palette</span></b><span style="font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20.53333282470703px;"> :- </span></span>Lists the transformations that you can use in the mapping. You can add a transformation by clicking the transformation name. Or, drag the transformation to the mapping canvas.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Properties Panel </b></span><span style="font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20.53333282470703px;">:- </span></span>Displays configuration options for the mapping or selected transformation. Different options display based on the transformation type. This is similar to different tabs available in PowerCenter Transformations.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Toolbar</b></span><span style="font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20.53333282470703px;"> :- </span></span>Provides different options such as Save, Cancel, Validate, Arrange All icon, Zoom In/Out.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Status Area </b></span><span style="font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20.53333282470703px;">:- </span></span>Displays the status of the mapping and related tasks. It indicates if the mapping includes unsaved changes. When all changes are saved, indicates if the mapping is valid or invalid.</li>
</ol>
<div>
<h2 style="text-align: start;">
Transformations On Cloud Designer</h2>
<div style="text-align: justify;">
Transformations are a part of a mapping that represent the operations that you want to perform on data. Transformations also define how data enters each transformation.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The Mapping Designer provides a set of Active and Passive transformations. '<span style="color: #cc0000;">Joiner</span>' and '<span style="color: #cc0000;">Filter</span>' are the two active transformations available. '<span style="color: #cc0000;">Expression</span>' is passive transformation and '<span style="color: #cc0000;">LookUp</span>' transformation act as passive when returning one row and active when returning more than one row. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Additionally designer supports '<span style="color: #cc0000;">Source</span>' and '<span style="color: #cc0000;">Target</span>' transformations to read and write data from different sources and targets.</div>
<div class="ww_skin_page_overflow" style="font-family: Arial, sans-serif; font-size: 15px; overflow: visible; padding: 8px 0px 0px;">
<table border="1" cellpadding="5" cellspacing="0" class="Default" style="border-collapse: collapse; border-color: black; border-style: solid; border-width: 1pt 1pt 1px; empty-cells: hide; margin: 12pt 0px; padding: 5pt;" summary=""><tbody>
<tr><th style="background-color: #90bfed; border-color: black; border-style: solid; border-width: 1px 1px 1pt; vertical-align: top;"><div class="Table_Cell_Head" id="ww3_28_12_11_4_2_2_2_4_2_2" style="line-height: 1.4em; margin-bottom: 1em; margin-top: 0.5em; text-align: left;">
Transformation</div>
</th><th style="background-color: #90bfed; border-color: black; border-style: solid; border-width: 1px 1px 1pt; vertical-align: top;"><div class="Table_Cell_Head" id="ww3_28_12_11_4_2_2_2_4_2_4" style="line-height: 1.4em; margin-bottom: 1em; margin-top: 0.5em; text-align: left;">
Type</div>
</th><th style="background-color: #90bfed; border-color: black; border-style: solid; border-width: 1px 1px 1pt; vertical-align: top;"><div class="Table_Cell_Head" id="ww3_28_12_11_4_2_2_2_4_2_6" style="line-height: 1.4em; margin-bottom: 1em; margin-top: 0.5em; text-align: left;">
Description</div>
</th></tr>
<tr><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_2_2" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Source</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_2_4" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
N/A</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_2_6" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Reads data from a source.</div>
</td></tr>
<tr><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_4_2" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Target</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_4_4" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
N/A</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_4_6" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Writes data to a target.</div>
</td></tr>
<tr><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_6_2" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Joiner</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_6_4" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Active</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_6_6" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Joins two sources.</div>
</td></tr>
<tr><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_8_2" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Filter</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_8_4" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Active</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_8_6" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Filters data from the data flow.</div>
</td></tr>
<tr><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_10_2" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Expression</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_10_4" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Passive</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_10_6" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Modifies data based on passive expressions.</div>
</td></tr>
<tr><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_12_2" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Lookup</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_12_4" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Passive when returning one row. Active when returning more than one row.</div>
</td><td style="border: 1px solid black; padding: 5px; vertical-align: top;"><div class="Table_Cell" id="ww3_28_12_11_4_2_2_2_6_12_6" style="line-height: 1.4em; margin-bottom: 0.1em; margin-top: 0.1em;">
Looks up data from a lookup object. Defines the lookup object and connection, as well as the lookup condition and return values.</div>
</td></tr>
</tbody></table>
</div>
<h2>
Mapping Configuration Task</h2>
<div style="text-align: justify;">
Mapping Configuration Task is similar to a session task in PowerCenter. The Mapping Configuration Task allows you to process data based on the data flow logic defined in a mapping.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
When you create a mapping configuration task, you select the mapping for the task to use, just like you choose a mapping while you create a session task in PowerCenter. You also define the parameter value associated with the mapping. </div>
</div>
<div>
<br /></div>
<div>
Below shown is the different options you need to set for the Mapping Configuration.</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO0DUh6feN07OTobw46Li9UDRufk5K4IB_GY09hFqUZQ2QrzhSwFA4Uu8LZfPhZnzGizOdDecuUBWz0N5ZL00WqnD_iQ9QdDsERly34QeqgdnBc9oN5F5CKK2LT9txK-W_3etQb1UBWGA/s1600/Screen+Shot+2014-05-26+at+5.35.54+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO0DUh6feN07OTobw46Li9UDRufk5K4IB_GY09hFqUZQ2QrzhSwFA4Uu8LZfPhZnzGizOdDecuUBWz0N5ZL00WqnD_iQ9QdDsERly34QeqgdnBc9oN5F5CKK2LT9txK-W_3etQb1UBWGA/s1600/Screen+Shot+2014-05-26+at+5.35.54+PM.png" /></a></div>
</div>
<h2>
Task Flows</h2>
<div style="text-align: justify;">
Task Flows are similar to a workflows in PowerCenter. You can create a task flow to group multiple tasks and run them in a specific order. You can run the task flow immediately or on a schedule. The task flow runs tasks serially, in the specified order.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="text-align: start;">Below shown is the different options you need to set for the Mapping Configuration.</span></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4z-rMYy49DivcWV0d8AeaW6O0aQYrRo4Y-S82oCnddQ6JKX9Gq2A0dfxkdWYyB9wqCQUS2SGDWNaGr3reHFfqGAjBV0F_6HQKEeaMfRKKXGEEkNZe0-SFATG1jcPxEyt-iStqPFAa3HM/s1600/Screen+Shot+2014-05-26+at+9.50.21+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4z-rMYy49DivcWV0d8AeaW6O0aQYrRo4Y-S82oCnddQ6JKX9Gq2A0dfxkdWYyB9wqCQUS2SGDWNaGr3reHFfqGAjBV0F_6HQKEeaMfRKKXGEEkNZe0-SFATG1jcPxEyt-iStqPFAa3HM/s1600/Screen+Shot+2014-05-26+at+9.50.21+PM.png" height="640" width="612" /></a></div>
<h2>
How Cloud Designer is Different</h2>
<div style="text-align: justify;">
Cloud Designer is not a replacement for PowerCenter Designer, but to provide more advanced data integration capability on the cloud. There are few interesting features available with Cloud Designer, which is not available in PowerCenter Designer.</div>
<div>
<h3>
1. Dynamic Field Propagation</h3>
<div>
<div style="text-align: justify;">
Unlike PowerCenter Designer, you do not have to connect all the ports manually between transformations. It uses logical rules to propagate fields or ports from one transformation to other transformation.</div>
</div>
<div>
</div>
<div>
<br />
Possible options for logical field mapping.</div>
<div>
<ul>
<li>Include All Fields.</li>
<li>Include/Exclude Field by specific names. </li>
<li>Include/Exclude Fields by Data Types.</li>
<li>Include/Exclude Fields by name patterns.</li>
</ul>
Below shown is the screenshot of available options for logical field mapping. This option is available in the "Property Panel".<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://lh5.ggpht.com/-TbbfwhWoyyc/U4PKBLzJibI/AAAAAAAAJYs/z6Hg0gy9bmA/s1600-h/Screen%252520Shot%2525202014-05-26%252520at%2525203.56.06%252520PM%25255B7%25255D.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" src="http://lh5.ggpht.com/-0kQSUwM-xsw/U4PKBTmsiRI/AAAAAAAAJZM/bJsTI62arCs/Screen%252520Shot%2525202014-05-26%252520at%2525203.56.06%252520PM_thumb%25255B17%25255D.png?imgmax=800" height="210" style="border: 0px; display: inline; margin: 10px 0px 0px;" title="" width="650" /></a></div>
<div>
It helps the mapping to self-adapts to source or target structure changes. For example if you use “All Fields” brings in newly added fields dynamically into the mapping.</div>
</div>
<div>
<h3>
2. Parameterized Templates</h3>
</div>
<div style="text-align: justify;">
A parameter is placeholder for a value or values in a mapping. The Cloud Designer can be used to build reusable mappings that include parameterized values. This can be configured to create an integration workflow with specific business parameters entered at runtime.</div>
<div style="text-align: justify;">
<br />
You define the value of the parameter when you configure the mapping configuration task. as mentioned above paragraph. Parameterization along with dynamic field propagation, makes the mapping build on cloud extremely reusable templates.<br />
<h2>
Video Demo</h2>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" frameborder="0" height="360" mozallowfullscreen="" src="http://videos.informaticacloud.com/player/49yAm/language_en/" title="Informatica Cloud Video Player" webkitallowfullscreen="" width="640"></iframe><br /></div>
You can get a free 30 day trial from <a href="https://app.informaticaondemand.com/ma/register?offerCode=30day-Website">here</a>. Leave us your thoughts on Informatica Cloud Designer and other Cloud Apps and how you are using it in your enterprise.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-82403100462313349612014-05-12T23:13:00.000-07:002014-06-15T11:09:00.750-07:00Informatica Cloud for Dummies - Informatica Cloud, Components & Applications<img align="left" alt="Informatica Cloud Designer for Advanced Data Integration On the Cloud" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIR9FnTQbP5K6_nidThAdcm9jSc15EAxBJdMH7oIwZOdMdFYmFL5F16TwrHLLzvNF0nUpzDq2RP0hU2AE73rRkgIuQo3bRque9DFkzkR4Hbt6pBnCmL3rP7bCno2A3EyHguKllTAigfpY/s1600/InformaticaCloud.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div>
<div style="text-align: justify;">
Informatica Cloud is an on-demand subscription service that provides cloud applications. When you subscribe to Informatica Cloud, you use a web browser to connect to Informatica Cloud. Informatica Cloud runs at a hosting facility.
<br />
<a name='more'></a></div>
<h2>
Informatica Cloud Components</h2>
</div>
Informatica Cloud includes the following components.
<br />
<br />
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>1. Informatica Cloud</b></span> :- A browser-based application that runs at the Informatica Cloud hosting facility. It allows you to configure connections, create users, and create, run, schedule, and monitor tasks.
<br />
<a href="http://lh3.ggpht.com/-wD9ETqSDFJE/U5P5wLCzh7I/AAAAAAAAJbA/AFcvuLWYkx0/s1600-h/Screen%252520Shot%2525202014-06-07%252520at%2525208.35.49%252520PM%25255B15%25255D.png"><img alt="Informatica Cloud Browser Logon" border="0" src="http://lh3.ggpht.com/-F7lEXWCniLo/U5P5wuLJb2I/AAAAAAAAJbI/9GUAEiv5p14/Screen%252520Shot%2525202014-06-07%252520at%2525208.35.49%252520PM_thumb%25255B15%25255D.png?imgmax=800" height="125" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 10px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="500" /></a>
You can log on to Informatica Cloud application using your user id and password.
<br />
<br />
<div>
<span style="color: #cc0000;"><b>2. Informatica Cloud hosting facility</b></span> :- A facility where the Informatica Cloud application runs. The Informatica Cloud hosting facility stores all task and organization information like it is stored in PowerCenter repository. Informatica Cloud does not store or stage source or target data.</div>
</div>
<div>
<div style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;">
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">3. Informatica Cloud applications</span></b> :- Applications that you can use to perform tasks, such as data synchronization, contact validation, and data replication.</div>
<a href="http://lh5.ggpht.com/-kWTQltxQyvk/U5P5tw0qHbI/AAAAAAAAJak/qYGu4WzYukA/s1600-h/Screen%252520Shot%2525202014-06-07%252520at%2525208.25.26%252520PM%25255B32%25255D.png"><img alt="Informatica Cloud Applications" border="0" src="http://lh6.ggpht.com/-lANMyi2HD60/U5P5uqA4muI/AAAAAAAAJas/pcIi8l8HMBM/Screen%252520Shot%2525202014-06-07%252520at%2525208.25.26%252520PM_thumb%25255B24%25255D.png?imgmax=800" height="306" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="400" /></a>
<br />
<div style="text-align: justify;">
<b><span style="color: #cc0000;">4. Informatica Cloud Secure Agent</span></b> :- A component of Informatica Cloud installed on a local machine that runs all tasks and provides firewall access between the hosting facility and your organization. When the Secure Agent runs a task, it connects to the Informatica Cloud hosting facility to access task information, connects directly and securely to sources and targets, transfers data between sources and targets, and performs any additional task requirements.</div>
<div>
<a href="http://lh3.ggpht.com/-vAUaCbDWrWI/U5P5vJSHWAI/AAAAAAAAJaw/g1IpwDTTEVg/s1600-h/Screen%252520Shot%2525202014-06-07%252520at%2525208.25.54%252520PM%25255B18%25255D.png"><img alt="Informatica Cloud for Secure Agent" border="0" src="http://lh5.ggpht.com/-ujzfuJTL3ZA/U5P5vijXv0I/AAAAAAAAJa8/Xh1YeRh86uQ/Screen%252520Shot%2525202014-06-07%252520at%2525208.25.54%252520PM_thumb%25255B19%25255D.png?imgmax=800" height="40" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto -15px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="600" /></a>
<br />
<h2>
Informatica Cloud Applications</h2>
<div style="text-align: justify;">
Informatica Cloud provides the following applications to help with different type of data integration tasks. These applications can be used to perform tasks, such as data synchronization, contact validation, and data replication and more.</div>
<ol><ul><ul>
<li><b><span style="color: #cc0000;">PowerCenter</span></b></li>
<li><b><span style="color: #cc0000;">Mapping Configuration</span></b></li>
<li><b><span style="color: #cc0000;">Data Synchronization</span></b></li>
<li><b><span style="color: #cc0000;">Data Replication</span></b></li>
<li><b><span style="color: #cc0000;">Contact Validation</span></b></li>
<li><b><span style="color: #cc0000;">Data Assessment</span></b></li>
<li><b><span style="color: #cc0000;">Data Masking</span></b></li>
</ul>
</ul>
</ol>
</div>
<h3>
PowerCenter</h3>
<div style="text-align: justify;">
The PowerCenter application allows you to <span style="color: #cc0000;"><i>Import PowerCenter workflows in to Informatica Cloud</i></span> and run them as Informatica Cloud tasks. When you create a task, you can associate it with a schedule to run it at specified times or on regular intervals. Or, you can run it manually. You can monitor tasks that are currently running in the activity monitor and view logs about completed tasks in the activity log.<br />
<br />
Below screenshot captures the options available to import a PowerCenter workflow.</div>
<a href="http://lh3.ggpht.com/-KJNEtfyduPQ/U5SDA4LH_AI/AAAAAAAAJeA/NaojeIlGK-o/s1600-h/Screen%252520Shot%2525202014-06-08%252520at%2525207.28.51%252520AM%25255B8%25255D.png"><img alt="Informatica Cloud for PowerCenter Task" border="0" src="http://lh3.ggpht.com/-npAoaTN6CnY/U5SDBsaMKwI/AAAAAAAAJeI/cTBgHIfs6s8/Screen%252520Shot%2525202014-06-08%252520at%2525207.28.51%252520AM_thumb%25255B6%25255D.png?imgmax=800" height="433" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="500" /></a>
<br />
<h3>
Mapping Configuration</h3>
<div style="text-align: justify;">
Mapping Configuration Task is <i><span style="color: #cc0000;">similar to a session task in PowerCenter</span></i>. The Mapping Configuration Task allows you to process data based on the data flow logic defined in a mapping.<br />
<br />
Below screenshot captures the options available to build a mapping configuration.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO0DUh6feN07OTobw46Li9UDRufk5K4IB_GY09hFqUZQ2QrzhSwFA4Uu8LZfPhZnzGizOdDecuUBWz0N5ZL00WqnD_iQ9QdDsERly34QeqgdnBc9oN5F5CKK2LT9txK-W_3etQb1UBWGA/s1600/Screen+Shot+2014-05-26+at+5.35.54+PM.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Informatica Cloud Mapping Configuration" border="0" height="121" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO0DUh6feN07OTobw46Li9UDRufk5K4IB_GY09hFqUZQ2QrzhSwFA4Uu8LZfPhZnzGizOdDecuUBWz0N5ZL00WqnD_iQ9QdDsERly34QeqgdnBc9oN5F5CKK2LT9txK-W_3etQb1UBWGA/s1600/Screen+Shot+2014-05-26+at+5.35.54+PM.png" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="500" /></a></div>
<div style="text-align: justify;">
When you create a mapping configuration task, you select the mapping for the task to use, just like you choose a mapping while you create a session task in PowerCenter. You also define the parameter value associated with the mapping.</div>
</div>
<h3>
Data Synchronization</h3>
<div style="text-align: justify;">
Use to <span style="color: #cc0000;"><i>load data and integrate applications, databases</i></span>, and files. Includes add-on functionality such as saved queries and <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">mapplets</a>. The Data Synchronization application allows you to synchronize data between a source and target. This <span style="color: #cc0000;"><i>performs insert,update,delete and upsert operations</i></span>.<br />
<br />
Using data synchronization task you can perform insert,update,delete and upsert. Options are shown below.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://lh4.ggpht.com/-YnO60A4toFk/U5SC_dpKBaI/AAAAAAAAJdw/waa-8KSWpkI/s1600-h/Screen%252520Shot%2525202014-06-08%252520at%2525207.25.26%252520AM%25255B8%25255D.png"><img alt="Informatica Cloud for Data Synchronization" border="0" src="http://lh6.ggpht.com/-pDrbSi6XinE/U5SDAfrYi7I/AAAAAAAAJd4/v-1lJZSxZKU/Screen%252520Shot%2525202014-06-08%252520at%2525207.25.26%252520AM_thumb%25255B6%25255D.png?imgmax=800" height="240" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="500" /></a></div>
<div style="text-align: justify;">
For example, you can read sales leads from your sales database and write them into Salesforce. You can also use expressions to transform the data according to your business logic or use data filters to filter data before writing it to targets.</div>
<div>
<div>
<h3>
Data Replication</h3>
<div style="text-align: justify;">
Use to <i><span style="color: #cc0000;">replicate data from Salesforce or database sources to database or file</span></i> targets. You might replicate data to archive the data, perform offline reporting, or consolidate and manage data.<br />
<br />
Shown is the options available to setup data replication task.</div>
</div>
<div>
<a href="http://lh4.ggpht.com/-XtnM9ZXLzFs/U5SC9fAqWZI/AAAAAAAAJdQ/5pXuuqVJkhQ/s1600-h/Screen%252520Shot%2525202014-06-08%252520at%2525207.21.50%252520AM%25255B8%25255D.png"><img alt="Informatica Cloud for Data Replication" border="0" src="http://lh3.ggpht.com/-k_MiqddcY5s/U5SC95w72sI/AAAAAAAAJdY/RnfYcTxuQAc/Screen%252520Shot%2525202014-06-08%252520at%2525207.21.50%252520AM_thumb%25255B6%25255D.png?imgmax=800" height="121" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="475" /></a>
<br />
<h3>
Contact Validation</h3>
<div style="text-align: justify;">
Contact validation is used to <span style="color: #cc0000;"><i>validate and correct postal address data</i></span>, and add geocode information to postal address data. You can also validate email addresses and check phone numbers against the Do Not Call Registry. With the Contact Validation application, you can validate and correct postal address data, and add geocode information to postal address data. You can also validate email addresses and check phone numbers against the Do Not Call Registry.</div>
<div style="text-align: center;">
<a href="http://lh5.ggpht.com/-F0klkLh5ri8/U5SC7_uVnII/AAAAAAAAJdA/xvBFmewHjpY/s1600-h/Screen%252520Shot%2525202014-06-08%252520at%2525207.21.15%252520AM%25255B12%25255D.png"><img alt="Informatica Cloud for Contact Validation" border="0" src="http://lh3.ggpht.com/-hNvXqnj4zS0/U5SC8pEoESI/AAAAAAAAJdI/64BrmDFl4k0/Screen%252520Shot%2525202014-06-08%252520at%2525207.21.15%252520AM_thumb%25255B10%25255D.png?imgmax=800" height="110" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="525" /></a></div>
</div>
<div>
<div style="text-align: justify;">
The Contact Validation application reads data from sources, validates and corrects the selected validation fields, and writes data to output files. In addition to validation fields, the Contact Validation application can include up to 30 additional source fields in the output files for a task.</div>
<h3>
Data Assessment</h3>
<div style="text-align: justify;">
The Data Assessment application <span style="color: #cc0000;"><i>allows you to evaluate the quality of your Salesforce data</i></span>. Use to measure and <span style="color: #cc0000;"><i>monitor the quality of data</i></span> in the Accounts, Contacts, Leads, and Opportunities Salesforce CRM objects. It generates graphical dashboards that measure field completeness, field conformance, record duplication, and address validity for each Salesforce object. You can run data assessment tasks on an on-going basis to show trends in the data quality. </div>
</div>
<a href="http://lh4.ggpht.com/-GZMdJ6VPW4Q/U5SDCRo2s5I/AAAAAAAAJeQ/sDMOf3E7ILc/s1600-h/Screen%252520Shot%2525202014-06-08%252520at%2525207.20.32%252520AM%25255B9%25255D.png"><img alt="Informatica Cloud for Data Assessment" border="0" src="http://lh4.ggpht.com/-PwczOyMbl7M/U5SDC3UMOVI/AAAAAAAAJeY/F0J7KPXbCd0/Screen%252520Shot%2525202014-06-08%252520at%2525207.20.32%252520AM_thumb%25255B7%25255D.png?imgmax=800" height="109" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="475" /></a>
<br />
<div>
<h3>
Data Masking</h3>
<div style="text-align: justify;">
Use data masking to <i><span style="color: #cc0000;">replace source data in sensitive columns with realistic test data</span></i> for non-production environments. Data masking rules define the logic to replace the sensitive data. Assign data masking rules to the columns you need to mask.</div>
</div>
<div>
<ul></ul>
</div>
</div>
</div>
<!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F1.bp.blogspot.com%2F-ylv14woPj_A%2FU4QH7vglzxI%2FAAAAAAAAJZU%2FzN0Yn_aXe48%2Fs1600%2FScreen%2BShot%2B2014-05-26%2Bat%2B5.35.54%2BPM.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO0DUh6feN07OTobw46Li9UDRufk5K4IB_GY09hFqUZQ2QrzhSwFA4Uu8LZfPhZnzGizOdDecuUBWz0N5ZL00WqnD_iQ9QdDsERly34QeqgdnBc9oN5F5CKK2LT9txK-W_3etQb1UBWGA/s1600/Screen+Shot+2014-05-26+at+5.35.54+PM.png" --><!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F1.bp.blogspot.com%2F-ylv14woPj_A%2FU4QH7vglzxI%2FAAAAAAAAJZU%2FzN0Yn_aXe48%2Fs1600%2FScreen%2BShot%2B2014-05-26%2Bat%2B5.35.54%2BPM.png&container=blogger&gadget=a&rewriteMime=image%2F*?imgmax=800" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO0DUh6feN07OTobw46Li9UDRufk5K4IB_GY09hFqUZQ2QrzhSwFA4Uu8LZfPhZnzGizOdDecuUBWz0N5ZL00WqnD_iQ9QdDsERly34QeqgdnBc9oN5F5CKK2LT9txK-W_3etQb1UBWGA/s1600/Screen+Shot+2014-05-26+at+5.35.54+PM.png" --><div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-81941138992316306072014-04-07T00:58:00.002-07:002014-04-19T08:40:08.610-07:00How to Use Error Handling Options and Techniques in Informatica PowerCenter<img align="left" alt="Error Handling Options and Techniques in Informatica PowerCenter" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh82FDnEBdwaj0ZHa7FlPhn8BBufFkqtxPrlMP_E-JiAvYqZYNjZrjMBbK5c-drwlSSaI571cnecEMl2OZHmHqbabDj-geP63B8v1JI812W_BPNgVFV72SFHe49-5-Qgl9scjmtDldREFY/s1600/error.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
<a href="http://www.disoln.org/2012/10/User-Defined-Error-Handling-in-Informatica-PowerCenter.html">Data quality</a> is very critical to the success of every data warehouse projects. So ETL Architects and Data Architects spent a lot of time defining the <a href="http://www.disoln.org/2012/07/error-handling-made-easy-using.html">error handling </a>approach. Informatica PowerCenter is given with a set of options to take care of the error handling in your ETL Jobs. In this article, lets see how do we leverage the PowerCenter options to handle your exceptions.<br />
<a name='more'></a></div>
<h2>
Error Classification</h2>
<div>
<div style="text-align: justify;">
You have to deal with different type of errors in the ETL Job. When you run a session, the PowerCenter Integration Service can encounter fatal or non-fatal <span class="s1">errors</span>. Typical error handling includes:</div>
</div>
<div>
<ol><ul>
<li><b style="text-align: justify;"><span style="color: #cc0000;">User Defined Exceptions </span></b><span style="text-align: justify;">: Data issues critical to the data quality, which might get loaded to the database unless explicitly checked for quality. For example, a credit card transaction with a future transaction data can get loaded into the database unless the transaction date of every record is checked. </span></li>
<li><b style="text-align: justify;"><span style="color: #cc0000;">Non-Fatal Exceptions </span></b><span style="text-align: justify;">: Error which would get ignored by Informatica PowerCenter and cause the records dropout from target table otherwise handled in the ETL logic. For example, a data conversion transformation error out and fail the record from loading to the target table. </span></li>
<li><b style="text-align: justify;"><span style="color: #cc0000;">Fatal Exceptions </span></b><span style="text-align: justify;">: Errors such as database connection errors, which forces Informatica PowerCenter to stop running the workflow.</span></li>
</ul>
</ol>
</div>
<div class="p1">
<h2>
I. User Defined Exceptions</h2>
<img align="left" alt="Informatica user defined error handling" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCc49VOSb0bSHu4Ge_a-2ocL7XxF25p_frJIrGcAPj2nLvnt0J_kAQqx51VqYJog9onBfGn9Xl63-ErVF1VBTy0XNDq53J3nTZCpD1oRwWmnSIld7OY_E-sSv3HjGTw0LboC-dDAsZasA/s200/informatica-user-defined-error-handling.png?imgmax=800" height="50" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div>
Business users define the user defined user defined exception, which is critical to the data quality. We can setup the user defined error handling using;</div>
<ol><ol>
<ol><ol>
<li><a href="http://www.disoln.org/2012/10/User-Defined-Error-Handling-in-Informatica-PowerCenter.html">Error Handling Functions</a>.</li>
<li>User Defined Error Tables.</li>
</ol>
</ol>
</ol>
</ol>
<h3>
1. Error Handling Functions</h3>
<div style="text-align: justify;">
<div style="text-align: justify;">
We can use two functions provided by Informatica PowerCenter to define our user defined error capture logic.</div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<br /></div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>ERROR()</b> </span>: This function Causes the PowerCenter Integration Service to skip a row and issue an error message, which you define. The error message displays in the session log or written to the error log tables based on the error logging type configuration in the session.</div>
</div>
</div>
<div class="p1">
<div style="text-align: justify;">
<div style="text-align: justify;">
<br /></div>
</div>
<div class="p1">
<div>
<div style="text-align: justify;">
You can use <span class="s1">ERROR</span> in Expression transformations to validate data. Generally, you use <span class="s1">ERROR</span> within an IIF or DECODE function to set rules for skipping rows.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Eg : <span style="color: #cc0000;">IIF(TRANS_DATA > SYSDATE,<span style="background-color: #f9cb9c;">ERROR('Invalid Transaction Date')</span>)</span>
</div>
<div>
</div>
<div style="text-align: justify;">
Above expression raises an error and drops any record whose transaction data is greater than the current date from the ETL process and the target table.</div>
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">ABORT()</span> </b>: Stops the session, and issues a specified error message to the session log file or written to the error log tables based on the error logging type configuration in the session. When the PowerCenter Integration Service encounters an ABORT function, it stops transforming data at that row. It processes any rows read before the session aborts.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: left;">
<span style="text-align: justify;">You can use </span><span class="s1" style="text-align: justify;">ABORT</span><span style="text-align: justify;"> in Expression transformations to validate data.</span></div>
<div style="text-align: justify;">
Eg : <span style="color: #cc0000;">IIF(ISNULL(LTRIM(RTRIM(CREDIT_CARD_NB))),<span style="background-color: #f9cb9c;">ABORT('Empty Credit Card Number')</span>)</span></div>
<span style="color: #cc0000;"></span>
<br />
<div style="text-align: justify;">
<div style="text-align: justify;">
Above expression aborts the session if any one of the transaction records are coming with out a credit card number.</div>
</div>
<h3>
Error Handling Function Use Case </h3>
<div style="text-align: justify;">
Below shown is the configuration required in the expression transformation using ABORT() and ERROR() Function. This transformation is using the expressions as shown in above examples.</div>
<div style="text-align: center;">
<a href="http://lh4.ggpht.com/-Ps9JKWcCkZ8/U0JGFXzYFyI/AAAAAAAAJS4/y5P3ShnPgoU/s1600-h/image%25255B36%25255D.png"><img alt="image" border="0" src="http://lh3.ggpht.com/-png5qKtl6-Q/U0JGF1XFyJI/AAAAAAAAJTA/TE1rRcGcmgE/image_thumb%25255B30%25255D.png?imgmax=800" height="316" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 15px auto 10px;" title="image" width="640" /></a>
</div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Note </span></b>:- You need to use these two functions in a mapping along with a session configuration for <a href="http://www.disoln.org/2012/07/error-handling-made-easy-using.html">row error logging</a> to capture the error data from the source system. Depending on the session configuration, source data will be collected into Informatica predefined <span style="color: #cc0000;">PMERR error tables</span> or files.</div>
</div>
</div>
<br />
<div style="text-align: justify;">
<span style="color: #cc0000;">Please refer the article</span> "<a href="http://www.disoln.org/2012/10/User-Defined-Error-Handling-in-Informatica-PowerCenter.html">User Defined Error Handling in Informatica PowerCenter</a><b>" </b>for more detailed level implementation information on user defined error handling.</div>
<h3>
2. User Defined Error Tables</h3>
<div style="text-align: justify;">
Error Handling Functions are easy to implement with very less coding efforts, but at the same time there are some disadvantages such as readability of the error records from the PMERR tables and <a href="http://www.disoln.org/search/label/Performance%20Tips?max-results=8">performance</a> impact. To avoid the disadvantages of error handling functions, you can create your own error log tables and capture the error records into it.</div>
<div style="text-align: justify;">
<br />
Typical approach is to create an error table which is similar in structure to the source table. Error tables will include additional columns to tag the records as "error fixed", "processed". Below is a sample error table. This error table includes all the columns from the source table and additional columns to identify the status of the error record.</div>
<a href="http://lh3.ggpht.com/-SXg7jFaTW9I/U0Ixcb-DGxI/AAAAAAAAJR8/LOamayJqQ0E/s1600-h/image%25255B12%25255D.png"><img alt="How to Use Error Handling Options and Techniques in Informatica PowerCenter" border="0" src="http://lh6.ggpht.com/-B7jJQjadoNw/U0Ixco3ViaI/AAAAAAAAJSE/YHWmMGpold8/image_thumb%25255B10%25255D.png?imgmax=800" height="186" style="border: 0px; display: block; float: none; margin: 15px auto 10px;" title="" width="511" /></a>Below is the high level design.
<a href="http://lh5.ggpht.com/-eVoD2A6KJec/U0JCLWmtkAI/AAAAAAAAJSU/XqiOP7reDZM/s1600-h/Error%252520Processing%25255B12%25255D.png"><img alt="Error Processing" border="0" src="http://lh4.ggpht.com/-wb1NINN1OK8/U0JCL3fS8UI/AAAAAAAAJSc/l1Fa1FzWSB4/Error%252520Processing_thumb%25255B10%25255D.png?imgmax=800" height="657" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="Error Processing" width="600" /></a>
<br />
<div style="text-align: justify;">
Typical <a href="http://www.disoln.org/search/label/ETL%20Design?&max-results=15">ETL Design</a> will read error data from the error table along with the source data. During the data transformation, data quality will be checked and any record violating the quality check will be moved to error tables. Record flags will be used to identify the reprocessed and records which are fixed for reprocessing.</div>
<h2>
II. Non-Fatal Exceptions</h2>
<img align="left" alt="Error Handling made easy in Informatica powercenter workflow" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb5b9pvQPO1N8dkwWD_7kIDaJIeGuO2Ij020DD8YiywVJeTPh_y2tuVy-gd56vrKiuXqxfjnRR19R8zjWraxVMQwSbF5p0YtM0H995NE7_2KBP57wqPFA3Ra3etLuOpbBk8_MHpe-kUSw/s1600/informatica-error-handling.png?imgmax=800" height="50" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div>
Non-fatal exception causes the records to be dropped out in the ETL process, which is critical to quality. You can handle non-fatal exceptions using;</div>
<ol><ol>
<ol><ol>
<li>Default Port Value Setting.</li>
<li>Row Error Logging.</li>
<li>Error Handling Settings.</li>
</ol>
</ol>
</ol>
</ol>
<h3>
1. Default Port Value Setting </h3>
<div>
<div style="text-align: justify;">
Using default value property is a good way to handle exceptions due to NULL values and unexpected transformation errors. The Designer assigns default values to handle null values and output transformation errors. <a href="http://www.disoln.org/2012/08/Understand-Informatica-PowerCenter-Mapping-Designer.html">PowerCenter Designer</a> let you override the default value in input, output and input/output ports.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Default value property behaves differently for different port types;</div>
<div style="text-align: justify;">
<ul>
<li><b><span style="color: #cc0000;">Input ports</span></b> : Use default values if you do not want the Integration Service to treat null values as NULL.</li>
<li><span style="color: #cc0000;"><b>Output ports</b></span> : Use default values if you do not want to skip the row due to transformation error or if you want to write a specific message with the skipped row to the session log.</li>
<li><span style="color: #cc0000;"><b>Input/output ports</b></span> : Use default values if you do not want the Integration Service to treat null values as NULL. But no user-defined default values for output transformation errors in an input/output port.</li>
</ul>
<h3 style="text-align: start;">
Default Value Use Case</h3>
<div>
<b><span style="color: #cc0000;">Use Case 1</span></b></div>
<div>
<span style="text-align: justify;">Below shown is the setting required to handle NULL values. This setting converts any NULL value returned by the <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">dimension</a> lookup to the default value -1. This technique can be used to handle <a href="http://www.disoln.org/2013/12/Design-Approach-to-Handle-Late-Arriving-Dimensions-and-Late-Arriving-Facts.html">late arriving dimensions</a></span><br />
<span style="text-align: justify;"></span></div>
</div>
</div>
<div>
<a href="http://lh3.ggpht.com/-iAbkIOrccS8/U0JCMQsTnNI/AAAAAAAAJSg/t8MWICQXrKs/s1600-h/image%25255B25%25255D.png"><img alt="image" border="0" src="http://lh3.ggpht.com/-Na1ocJuQg20/U0JCM34lBsI/AAAAAAAAJSo/czKLwkCvGj4/image_thumb%25255B21%25255D.png?imgmax=800" height="392" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="image" width="538" /></a>
<br />
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Use Case 2</span></b></div>
<div style="text-align: justify;">
Below setting uses the default expression to convert the date if the incoming value is not in a valid date format.</div>
<a href="http://lh4.ggpht.com/-Py56aSkSqLg/U0JSu_emPiI/AAAAAAAAJTQ/L-y8fPez8Vo/s1600-h/image%25255B47%25255D.png"><img alt="image" border="0" src="http://lh6.ggpht.com/-0C1Xr5luvGA/U0JSveLePSI/AAAAAAAAJTY/oYF1bCIyFdk/image_thumb%25255B39%25255D.png?imgmax=800" height="421" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 5px;" title="image" width="538" /></a>
<br />
<h3>
2. Row Error Logging</h3>
</div>
<div style="text-align: justify;">
Row error logging helps in capturing any exception, which is not consider during the design and coded in the mapping. It is the perfect way of capturing any unexpected errors.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Below shown session error handling setting will capture any un handled error into PMERR tables.</div>
<a href="http://lh5.ggpht.com/-faZ4fR4Q-X8/U0JVzfnxdtI/AAAAAAAAJTk/txvtOHH0LQY/s1600-h/image%25255B62%25255D.png"><img alt="image" border="0" src="http://lh3.ggpht.com/-hALXmKOojz4/U0JVz0ytMSI/AAAAAAAAJTs/Pv_X2ubi7KA/image_thumb%25255B50%25255D.png?imgmax=800" height="480" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 5px;" title="image" width="583" /></a>
<br />
<div>
<div style="text-align: justify;">
<span style="color: #cc0000;">Please refer the article</span> <a href="http://www.disoln.org/2012/07/error-handling-made-easy-using.html">Error Handling Made Easy Using Informatica Row Error Logging</a> for more details.</div>
<h3>
3. Error Handling Settings</h3>
</div>
<div style="text-align: justify;">
Error handling properties at the session level is given with options such as Stop On Errors, Stored Procedure Error, Pre-Session Command Task Error and Pre-Post SQL Error. You can use these properties to ignore or set the session to fail if any such error occurs.</div>
<div>
<div class="p1">
</div>
<ul>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Stop On Errors</span></b> : Indicates how many non-fatal errors the Integration Service can encounter before it stops the session.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>On Stored Procedure Error</b></span> : If you select Stop Session, the Integration Service stops the session on errors executing a pre-session or post-session stored procedure.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>On Pre-Session Command Task Error</b></span> : If you select Stop Session, the Integration Service stops the session on errors executing pre-session shell commands.</li>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Pre-Post SQL Error</span></b> : If you select Stop Session, the Integration Service stops the session errors executing pre-session or post-session SQL.</li>
</ul>
<div class="p1">
</div>
<div class="p1">
</div>
<div class="p1">
</div>
</div>
<div>
<ol><ol>
</ol>
</ol>
<a href="http://lh3.ggpht.com/-qwTYuhVWnmA/U0JV0fNHNaI/AAAAAAAAJT0/YENBCiMO7dc/s1600-h/image%25255B73%25255D.png"><img alt="image" border="0" src="http://lh5.ggpht.com/-AThT9IVidEU/U0JV06e_jYI/AAAAAAAAJT8/YATmqaOgcxc/image_thumb%25255B59%25255D.png?imgmax=800" height="450" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 5px;" title="image" width="583" /></a></div>
<div>
<h2>
III. Fatal Exceptions</h2>
<img align="left" alt="Error Handling Options and Techniques in Informatica PowerCenter" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh82FDnEBdwaj0ZHa7FlPhn8BBufFkqtxPrlMP_E-JiAvYqZYNjZrjMBbK5c-drwlSSaI571cnecEMl2OZHmHqbabDj-geP63B8v1JI812W_BPNgVFV72SFHe49-5-Qgl9scjmtDldREFY/s1600/error.png" height="50" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
</div>
<div>
<div style="text-align: justify;">
A fatal error occurs when the Integration Service cannot access the source, target, or repository. When the session encounters fatal error, the PowerCenter Integration Service terminates the session. To handle fatal errors, you can either use a <a href="http://www.disoln.org/2013/02/Restartability-Design-for-Different-Type-ETL-Loads.html">restartable ETL design</a> for your workflow or use the <a href="http://www.disoln.org/2013/07/Workflow-Recovery-Configuration-for-Informatica-PowerCenter-Workflows.html">workflow recovery features</a> of Informatica PowerCenter </div>
<ol><ol>
<ol><ol>
<li><a href="http://www.disoln.org/2013/02/Restartability-Design-for-Different-Type-ETL-Loads.html">Restartable ETL Design</a></li>
<li><a href="http://www.disoln.org/2013/07/Workflow-Recovery-Configuration-for-Informatica-PowerCenter-Workflows.html">Workflow Recovery</a></li>
</ol>
</ol>
</ol>
</ol>
</div>
<h3>
1. Restartable ETL Design</h3>
<div style="text-align: justify;">
Restartability is the ability to restart an ETL job if a processing step fails to execute properly. This will avoid the need of any manual cleaning up before a failed job can restart. You want the ability to restart processing at the step where it failed as well as the ability to restart the entire ETL session.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #cc0000;">Please refer the article</span> "<a href="http://www.disoln.org/2013/02/Restartability-Design-for-Different-Type-ETL-Loads.html">Restartability Design Pattern for Different Type ETL Loads</a><b>" </b>for more details on restartable ETL design.</div>
<h3>
2. Workflow Recovery</h3>
<div>
<div style="text-align: justify;">
Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption. During the workflow recovery process Integration Service access the workflow state, which is stored in memory or on disk based on the recovery configuration. The workflow state of operation includes the status of tasks in the workflow and workflow variable values.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #cc0000;">Please refer the article</span> "<a href="http://www.disoln.org/2013/07/Workflow-Recovery-Configuration-for-Informatica-PowerCenter-Workflows.html">Informatica Workflow Recovery with High Availability for Auto Restartable Jobs</a><b>" </b>for more details on workflow recovery.<br />
<br />
Hope this article is useful for you guys. Please feel free to share your comments and any questions you may have.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-2957089785749969512014-03-16T23:28:00.005-07:002014-03-21T20:02:30.885-07:00How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings<img align="left" alt="SQL Overrides in Informatica PowerCenter Mappings" border="0" height="100" src="https://lh5.googleusercontent.com/-CNvhzHVQsxs/Uxy8dJR0_GI/AAAAAAAAJN8/IclVI92YaqQ/h120/sql+override.png" style="border-width: 0px; display: inline; margin: 0px 0px 0px -10px;" title="" width="100" /><br />
<div style="text-align: justify;">
Many Informatica PowerCenter developers tend to use <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL</a> Override during mapping development. Developers finds it easy and more productive to use SQL Override. At the same time <a href="http://www.disoln.org/search/label/ETL%20Design?max-results=8">ETL Architects</a> do not like SQL Overrides as it hide the ETL logic from metadata manager. In this article lets see the options available to avoid <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL</a> Override in different transformations.<br />
<a name='more'></a></div>
<h2>
What is SQL Override</h2>
<div style="text-align: justify;">
Transformations such as <a href="http://www.disoln.org/2012/08/Understand-Informatica-PowerCenter-Mapping-Designer.html">Source Qualifier</a> and <a href="http://www.disoln.org/2013/06/Pipeline-Lookups-Beyond-Relational-and-Flat-FIle-Data-Sources.html">LookUp</a> provides an option to override the default query generated by PowerCenter. You can enter any valid SQL statement supported by the underlying database. You can enter your own SELECT statement with a list of columns in the SELECT clause of the SQL, which is matching with the transformation ports. The <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL</a> can perform aggregate calculations, or call a stored procedure or stored function to read the data.<br />
<h2>
Source Qualifier Options to Avoid SQL Override</h2>
<div>
There are few options available in source qualifier to avoid the usage of SQL Override. These can be effectively used to avoid the usage of <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL</a> override.</div>
</div>
<h3>
1. User Defined Join</h3>
<div>
<div class="p1">
<div style="text-align: justify;">
User defined join option provides the most flexible options to avoid the usage of SQL Override. You need to enter only the contents of the WHERE clause of your SQL, not the entire query in user defined join option.<br />
<br />
If the <i><span style="color: #cc0000;">JOIN Syntax</span></i> of your query is entirely with in the <i><span style="color: #cc0000;">WHERE clause</span></i>, you can directly enter the WHERE clause of your query into the user defined join option, with out any modification. Oracle still supports the old way of join using <b><i><span style="color: #cc0000;">(+)</span></i></b>, which is with in the WHERE clause. Where as most of the other databases uses the latest JOIN syntax, which uses the JOIN syntax in the FROM clause.<br />
<br />
Below image shows the left outer join between CUSTOMER table and PURCHASES table. This join uses the Oracle Join syntax <b><i><span style="color: #cc0000;">(+)</span></i></b>.</div>
<a href="http://lh4.ggpht.com/-akfLWgdzF-4/UyfuPrVSSfI/AAAAAAAAJPY/Yh3cj3q1qGg/s1600-h/image%25255B12%25255D.png"><img alt="How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings" border="0" src="http://lh5.ggpht.com/-5UwWve1FBi8/UyfuQvlPcUI/AAAAAAAAJPg/lXSJnFctEEo/image_thumb%25255B9%25255D.png?imgmax=800" height="494" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="750" /></a><br />
<div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Note</span></b> :- You can not use the above option, if the JOIN Syntax of your query is with in the FROM clause.</div>
<h3 style="text-align: justify;">
Informatica Join Syntax</h3>
<div style="text-align: justify;">
If<span style="text-align: justify;"> the JOIN Syntax of your query is written with in the <span style="color: #cc0000;"><i>FROM clause</i></span>, you should use the <span style="color: #cc0000;"><i>Informatica Join Syntax</i></span></span> in the user defined join option.<span style="text-align: justify;"><span style="color: #cc0000;"><i> </i></span></span>When you use the Informatica join syntax, the Integration Service insert the join syntax in the WHERE clause or the FROM clause of the query, depending on the underlying database syntax.</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div>
Informatica Join supports, <b><i><span style="color: #cc0000;">Normal, Left Outer </span></i></b>and<b><i><span style="color: #cc0000;"> Right Outer</span></i></b> Joins and here is the join syntax.</div>
<div style="text-align: justify;">
<div class="p1">
</div>
<ul>
<li>Normal Join :- { source1 <b><span style="color: #cc0000;">INNER JOIN</span></b> <i>source2</i> <b><span style="color: #cc0000;">on</span></b> <i>join_condition</i> }</li>
<li>Left Outer Join :- { <i>source1</i> <b><span style="color: #cc0000;">LEFT OUTER JOIN</span></b> <i>source2</i> <b><span style="color: #cc0000;">on</span></b> <i>join_condition</i> }</li>
<li>Right Outer Join :- { <i>source1</i> <b><span style="color: #cc0000;">RIGHT OUTER JOIN</span></b> <i>source2</i> <b><span style="color: #cc0000;">on</span></b> <i>join_condition</i> }</li>
</ul>
<div class="p1">
</div>
<div class="p1">
</div>
</div>
<div style="text-align: justify;">
<div class="p1">
<b><span style="color: #cc0000;">Note</span></b> :- Enclose Informatica join syntax in braces <span style="color: #cc0000;">{ }</span>. </div>
</div>
<a href="http://lh5.ggpht.com/-Z-ytH8gGiSM/Uyf1Iw9WjiI/AAAAAAAAJPw/olX-Z05haiY/s1600-h/image%25255B24%25255D.png"><img alt="How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings" border="0" src="http://lh3.ggpht.com/-nSI6nkaNkjc/Uyf1J6sMZwI/AAAAAAAAJP4/GOkRPCTObIs/image_thumb%25255B19%25255D.png?imgmax=800" height="494" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="750" /></a>
<br />
<div style="text-align: justify;">
Above shown image is displaying the Informatica Join Syntax. Using the user defined join option, CUSTOMER table is left outer joined with PURCHASES table as shown in the above image.</div>
<h3>
2. Source Filter</h3>
<div>
<div class="p1" style="text-align: justify;">
Source filter option can be used to adjust the ‘WHERE’ clause of the <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL</a> created by the integration service, with out using the <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL</a> Override option. You can enter a source filter to reduce the number of rows the Integration Service queries. You can provide the source filter condition with out giving the string ‘WHERE’. </div>
</div>
</div>
</div>
<a href="http://lh3.ggpht.com/-mhC6TzDe94w/Uyf5Z0H0u9I/AAAAAAAAJQE/PD7CNRBQSgg/s1600-h/image%25255B57%25255D.png"><img alt="How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings" border="0" src="http://lh6.ggpht.com/-ujrNdDvjVkE/Uyf5a3YTfeI/AAAAAAAAJQM/U4MPayp4gmg/image_thumb%25255B46%25255D.png?imgmax=800" height="512" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="750" /></a>
<br />
Source filter option is used to filter source data based on the Customer ID.<br />
<h3>
3. Sorted Ports</h3>
<div class="p3" style="text-align: justify;">
Using the sorted ports option, you can sort the source data. When using sorted port option, Integration Service adds the ports to the ORDER BY clause in the default query. The Integration Service adds the configured number of ports, starting at the top of the Source Qualifier transformation. The sorted ports are applied on the connected ports rather than the ports that start at the top of the Source Qualifier transformation.</div>
<a href="http://lh6.ggpht.com/-RA20I3ppKMQ/Uyf5btRwnEI/AAAAAAAAJQU/G4ev9xsTj98/s1600-h/image%25255B52%25255D.png"><img alt="How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings" border="0" src="http://lh3.ggpht.com/-2S3LZ8cFC34/Uyf5cTYK-nI/AAAAAAAAJQc/U_v_oGFsyDk/image_thumb%25255B41%25255D.png?imgmax=800" height="399" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="538" /></a>
<br />
<div style="text-align: justify;">
Based on the setting above, source data is sorted on the first two connected ports from the source qualifier to the downstream transformations. The data is sourced in the ascending order.</div>
<h3>
4. Select Distinct</h3>
<div class="p3">
</div>
<div class="p3" style="text-align: justify;">
If you want the Integration Service to select unique values from a source, use the Select Distinct option. Using Select Distinct filters out unnecessary data earlier in the data flow, which might improve performance. </div>
<a href="http://lh4.ggpht.com/-poQ1kG4SKHs/Uyf5dPXMELI/AAAAAAAAJQk/Stg-Tu7nPpE/s1600-h/image%25255B48%25255D.png"><img alt="How to Avoid The Usage of SQL Overrides in Informatica PowerCenter Mappings" border="0" src="http://lh4.ggpht.com/-d3bq9HUyRB8/Uyf5d7opYhI/AAAAAAAAJQs/_nOE8ofxDdI/image_thumb%25255B37%25255D.png?imgmax=800" height="399" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="538" /></a>
<br />
'Select Distinct' option can be set in source qualifier as shown in the above image.<br />
<h2>
Advantages and Limitations of SQL Override</h2>
<div>
<h3>
Pros</h3>
<ul>
<li>Utilize database optimizers techniques such as indexes, hints. </li>
<li>Can accommodate complex queries.</li>
</ul>
<h3>
Cons </h3>
<ul>
<li>Lose transformation logic in metadata searched. </li>
<li>Unable to utilize <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html">Partitioning</a> or <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Pushdown-Optimization-an-ELT-Approach.html">Pushdown Optimization</a> options.</li>
<li>Processing impacts database resources. </li>
</ul>
<div style="text-align: justify;">
Hope you enjoyed this article. Feel free to ask any further questions or clarification you may have below in the comment section. We are happy to help you with.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-41117092872849024862014-02-28T00:02:00.000-08:002014-03-20T14:34:58.618-07:00Data Security Using Informatica PowerCenter Data Masking Transformation<img align="left" alt="Informatica Data masking Transactions" border="0" src="http://4.bp.blogspot.com/-OZXLRNactos/UwAlrY9GXPI/AAAAAAAAJLE/9JW9nCbu-LU/s1600/Data-encryption-icon.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div>
<div style="text-align: justify;">
You might have come across scenario where in you do not have enough good data in your Development and QA regions for your testing purpose; and you are not allowed to copy over data from production environment due to the data security reasons. Now using <a href="http://www.disoln.org/2012/07/informatica-powercenter-client-tools.html">Informatica PowerCenter</a> data masking transformation you can overcome such scenarios. In this article, lets see the usage of masking <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformation</a>.
<br />
<a name='more'></a><h2>
What is Data Masking Transformation</h2>
<div>
<div class="p1">
Using Data Masking <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformation</a>, you change sensitive production data to realistic test data for non-production environments. The Data Masking <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformation</a> modifies source data based on masking rules that you configure for each column.</div>
<div class="p1">
<br /></div>
<div class="p1">
You can apply the following types of masking with the Data Masking <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformation</a>.</div>
<ul>
<li><b><span style="color: #cc0000;">Key masking</span></b> :- Produces deterministic results for the same source data,. </li>
<li><span style="color: #cc0000;"><b>Random masking </b></span>:- Produces random, non-repeatable results for the same source data. </li>
<li><span style="color: #cc0000;"><b>Expression masking</b></span> :- Applies an expression to a port to change the data or create data. </li>
<li><span style="color: #cc0000;"><b>Substitution</b></span> :- Replaces a column of data with similar but unrelated data from a dictionary. </li>
<li><span style="color: #cc0000;"><b>Special mask formats</b></span> :- Applies special mask formats to change SSN, credit card number, phone number, URL, email address, or IP addresses.</li>
</ul>
<div>
Lets see each masking rules in detail.</div>
<h2>
Key Masking</h2>
A column configured for key masking returns deterministic masked data each time the source value and seed value are the same. The masked output remains the same with the same input value. Use the same seed value to generate same masked value between transformations for the same input value.<br />
<h3>
Key Masking Properties</h3>
You can configure the following masking rules and properties for key masking string values:<br />
<ul>
<li><b><span style="color: #cc0000;">Seed </span></b>:- Apply a seed value to generate same masked data for a column for the input between sessions. Select one of the following options:</li>
<ul>
<li><span style="color: #cc0000;">Value</span> :- Accept the default seed value or enter a number between 1 and 1,000. </li>
<li><span style="color: #cc0000;">Mapping Parameter </span>:- Use a mapping parameter to define the seed value.</li>
</ul>
<li><span style="color: #cc0000;"><b>Mask Format</b></span> :- Define the type of character to substitute for each character in the input data. Use this property to keep the input and masked data in the same format.</li>
<li><b><span style="color: #cc0000;">Source String Characters </span></b>:- Source string characters are source characters that you choose to mask or not mask. </li>
<li><b><span style="color: #cc0000;">Result String Characters</span></b> :- Substitute the characters in the target string with the characters you define in Result String Characters.</li>
</ul>
<div>
<b><span style="color: #cc0000;">Hint</span></b> :- Use the same seed value to mask a primary key in a table and the foreign key value in another table.</div>
<div>
<br /></div>
<b><span style="color: #cc0000;">Example</span></b> :- Below shown is the masking properties for Key Masking. This transformation masks the DEPT_ID column using key masking. The masked DEPT_ID will have the format for DDD+AAAAAA<br />
<a href="http://lh4.ggpht.com/-SWEkZo-sZ8I/UxSdy4Oc9pI/AAAAAAAAJLw/aBwfBicjG2A/s1600-h/image%25255B10%25255D.png"><img alt="Data Security Using Informatica PowerCenter Data Masking Transformation - Key Masking" border="0" src="http://lh6.ggpht.com/-jpkhExvZfGg/Uwt-mj3kGqI/AAAAAAAAJL4/-PV5do_TXwA/image_thumb%25255B9%25255D.png?imgmax=800" height="542" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="550" /></a><br />
<ul>
</ul>
<h2>
Substitution Masking</h2>
Substitution masking replaces a column of data with similar but unrelated data. When you configure substitution masking, define the relational or flat file dictionary that contains the substitute values. The Data Masking transformation performs a lookup on the dictionary that you configure and replaces source data with data from the dictionary. It is an effective way to replace production data with realistic test data.<br />
<h3>
Substitution Source Directories</h3>
For using substitution masking, you need a flat file or relational table that contains the substitute data and a serial number for each row in the file or the relational table. The serial number should start from one and can not have any missing numbers..<br />
<br />
Below is the structure of the substitution file, which got a serial number column, department id and the corresponding masked department id.<br />
<br />
<span style="color: #cc0000;">SNO,DEPT_ID,MASKED_DEPT_ID,1,DPT-128923,ABC-999999,2,DPT-234265,LMN-888888</span><br />
<ul><ul><ul><ul>
</ul>
</ul>
</ul>
</ul>
<h3>
Substitution Masking Properties</h3>
</div>
<div>
You can configure the following masking rules for substitution masking.<br />
<ul>
<li><b><span style="color: #cc0000;">Repeatable Output</span></b> :- Returns same results between sessions for the same input.</li>
<li><b><span style="color: #cc0000;">Seed</span></b> :- Apply a seed value to generate same masked data for a column for the input between sessions. Select one of the following options: </li>
<ul>
<li><span style="color: #cc0000;">Value</span> :- Accept the default seed value or enter a number between 1 and 1,000. </li>
<li><span style="color: #cc0000;">Mapping Parameter </span>:- Use a mapping parameter to define the seed value.</li>
<li><span style="color: #cc0000;">Unique Output </span>:- Force the PowerCenter Integration Service to create unique Data Masking output values for unique input values. No two input values are masked to the same output value.</li>
</ul>
<li><b><span style="color: #cc0000;">Dictionary Information</span></b> :- Configure the flat file or relational table that contains the substitute data values. </li>
<ul>
<li><span style="color: #cc0000;">Relational Table</span> :- Select Relational Table if the dictionary is in a database table. </li>
<li><span style="color: #cc0000;">Flat File </span>:- Select Flat File if the dictionary is in flat file delimited by commas. </li>
<li><span style="color: #cc0000;">Dictionary Name</span> :- Displays the flat file or relational table name that you selected. </li>
<li><span style="color: #cc0000;">Serial Number Column</span> :- Select the column in the dictionary that contains the serial number. </li>
<li><span style="color: #cc0000;">Output Column</span> :- Choose the column to return to the Data Masking transformation. </li>
</ul>
<li><span style="color: #cc0000;"><b>Lookup condition</b></span> :- When you configure a lookup condition you compare the value of a column in the source with a column in the dictionary to pick the masked value.</li>
<ul>
<li><span style="color: #cc0000;">Input port</span> :- Source data column to use in the lookup. </li>
<li><span style="color: #cc0000;">Dictionary column</span> :- Dictionary column to compare the input port to.</li>
</ul>
</ul>
<div>
<b><span style="color: #cc0000;">Example</span></b> :- Below shown is the masking properties for Substitution Masking. As per the example below, SNO is the serial number column and MASKED_DEPT_ID is the substitution value from the file for each DEPT_ID. Lookup condition to search the flat file is defined on DEPT_ID.</div>
</div>
<div>
<a href="http://lh3.ggpht.com/-xdTz2GWE8aI/UxSd8m-ASVI/AAAAAAAAJMA/jdfiI8Hz8gg/s1600-h/image%25255B26%25255D.png"><img alt="Data Security Using Informatica PowerCenter Data Masking Transformation - Substitution Masking" border="0" src="http://lh4.ggpht.com/-BDtNV3Trhvs/UxSd9fCy-zI/AAAAAAAAJMI/9yo7CSiADYE/image_thumb%25255B23%25255D.png?imgmax=800" height="542" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="550" /></a>
<br />
<h2>
Random Masking</h2>
<div class="p2">
Random masking generates random masked data. The Data Masking transformation returns different values when the same source value occurs in different rows. You can mask numeric, string or date values with random masking.</div>
<h3>
Random Masking Properties</h3>
<div>
You can configure the following masking rules for random masking.</div>
<ul>
<li><span style="color: #cc0000;"><b>
</b></span><div class="p1">
<span style="color: #cc0000;"><b><b>Range</b></b></span> :- Configure the minimum and maximum string length. The Data Masking transformation returns a string of random characters between the minimum and maximum string length.</div>
</li>
<li><span style="color: #cc0000;"><b>Mask Format</b></span> :- Define the type of character to substitute for each character in the input data. Use this property to keep the input and masked data in the same format.</li>
<li><b><span style="color: #cc0000;">Source String Characters </span></b>:- Source string characters are source characters that you choose to mask or not mask. </li>
<li><b><span style="color: #cc0000;">Result String Characters</span></b> :- Substitute the characters in the target string with the characters you define in Result String Characters.</li>
</ul>
<div>
<b><span style="color: #cc0000;">Example</span></b> :- Below shown is the masking properties for Expression Masking. As per the example below, masked DEPT_ID will have the format for DDD+AAAAAA and the character '-' will not be masked.</div>
<div>
<a href="http://lh5.ggpht.com/-Aq-eXxwYg1A/UxVnCHuncgI/AAAAAAAAJMs/A22OzxZ1HSY/s1600-h/image%25255B12%25255D.png"><img alt="Data Security Using Informatica PowerCenter Data Masking Transformation - Random Masking" border="0" src="http://lh4.ggpht.com/-31ME7KNWcEU/UxVnDivY8PI/AAAAAAAAJM0/YoQ2pFhLpgU/image_thumb%25255B9%25255D.png?imgmax=800" height="543" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="550" /></a>
<br />
<h2>
Expression Masking</h2>
<div class="p2">
Expression masking applies an expression to a port to change the data or create new data. When you configure expression masking, create an expression in the Expression Editor. You can select input and output ports, functions, variables, and operators to build expressions.<br />
<br />
<b><span style="color: #cc0000;">Example</span></b> :- Below shown is the masking properties for Expression Masking.</div>
</div>
</div>
</div>
</div>
<a href="http://lh4.ggpht.com/-zzt_0VpWaQ8/UxV05B9WOSI/AAAAAAAAJNE/ddzlXUgqmPQ/s1600-h/image%25255B22%25255D.png"><img alt="Data Security Using Informatica PowerCenter Data Masking Transformation - Expression Masking" border="0" src="http://lh5.ggpht.com/-EirbrY9XROA/UxV06-3wOaI/AAAAAAAAJNM/y3Lh1Quf49g/image_thumb%25255B17%25255D.png?imgmax=800" height="543" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="550" /></a><br />
<h2>
Special Masking Formats</h2>
<div>
<div class="p1" style="text-align: justify;">
Applies special mask formats to change <i><span style="color: #cc0000;">SSN, credit card number, phone number, URL, email address, or IP addresses</span></i>. The Data Masking transformation returns a masked value that has a realistic format, but is not a valid value. For example, when you mask an SSN, the Data Masking transformation returns an SSN that is the correct format but is not valid. You can configure repeatable masking for Social Security numbers.<br />
<br />
<b><span style="color: #cc0000;">Example</span></b> :- Below shown is the masking properties for Special Masking.</div>
</div>
<a href="http://lh6.ggpht.com/-3VZWf0elWTM/UxWDkR0vXWI/AAAAAAAAJNc/gCupoodXUJ8/s1600-h/image%25255B33%25255D.png"><img alt="Data Security Using Informatica PowerCenter Data Masking Transformation - Special formats" border="0" src="http://lh5.ggpht.com/-E9C09JAz1-U/UxWDlzL8BZI/AAAAAAAAJNk/gCbSlQ8-sQg/image_thumb%25255B26%25255D.png?imgmax=800" height="543" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="550" /></a><br />
<h2>
Masking Properties in Detail</h2>
<div>
Lets see few masking properties in detail.</div>
<h3>
1. Mask Format</h3>
<div>
<div class="p1" style="text-align: justify;">
Configure a mask format to limit each character in the output column to an alphabetic, numeric, or alphanumeric character. This property is used by random and key masking. Use the following characters to define a mask format: </div>
<div class="p1">
</div>
<ol>
<li><b><span style="color: #cc0000;">A</span></b> :- Alphabetical characters. For example, ASCII characters a to z and A to Z.</li>
<li><span style="color: #cc0000;"><b>D</b></span> :- Digits. 0 to 9.</li>
<li><span style="color: #cc0000;"><b>N</b></span> :-Alphanumeric characters. For example, ASCII characters a to z, A to Z, and 0-9.</li>
<li><span style="color: #cc0000;"><b>X</b></span> :-Any character. For example, alphanumeric or symbol.</li>
<li><b><span style="color: #cc0000;">+</span></b> :- No masking.</li>
<li><span style="color: #cc0000;"><b>R</b></span> :- Specifies that the remaining characters in the string can be any character type.</li>
</ol>
<div>
<h3>
2. Source String Characters</h3>
<div class="p2" style="text-align: justify;">
Source string characters are source characters that you choose to mask or not mask. The position of the characters in the source string does not matter but it is case sensitive. <span style="text-align: justify;">This property is used by random and key masking.</span></div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
<b><span style="color: #cc0000;">Mask Only</span> :-</b> The Data Masking transformation masks characters in the source that you configure as source string characters. For example, if you enter the characters A, B, and c, the Data Masking transformation replaces A, B, or c with a different character when the character occurs in source data. A source character that is not an A, B, or c does not change. The mask is case sensitive.</div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p2">
</div>
<div class="p1" style="text-align: justify;">
<b><span style="color: #cc0000;">Mask All Except</span> :-</b> Masks all characters except the source string characters that occur in the source string.</div>
<h3>
3. Result String Replacement Characters</h3>
<div class="p1">
</div>
<div class="p2" style="text-align: justify;">
Result string replacement characters are characters you choose as substitute characters in the masked data. When you configure result string replacement characters, the Data Masking transformation replaces characters in the source string with the result string replacement characters. This property is used by random and key masking.</div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
<b><span style="color: #cc0000;">Use Only</span> :-</b> Mask the source with only the characters you define as result string replacement characters. For example, if you enter the characters A, B, and c, the Data Masking transformation replaces every character in the source column with an A, B, or c. The word “horse” might be replaced with “BAcBA.” </div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p2" style="text-align: justify;">
</div>
<div class="p1" style="text-align: justify;">
<b><span style="color: #cc0000;">Use All Except</span> :-</b> Mask the source with any characters except the characters you define as result string replacement characters. For example, if you enter A, B, and c result string replacement characters, the masked data never has the characters A, B, or c.<br />
<br />
Hope you enjoyed this article. Feel free to ask any further questions or clarification you may have below in the comment section. We are happy to help you with.</div>
</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-53714997923489591262014-01-29T22:37:00.001-08:002014-03-03T19:46:25.937-08:00Transaction Control Transformation to Control Commit and Rollback in Your ETL<img align="left" alt="Transaction Control Transformation to Control Commit and Rollback Transactions" border="0" src="http://lh4.ggpht.com/-oHvrBojfebY/UuX_pVQJnAI/AAAAAAAAJI8/OUuwc4DBGbg/image_thumb%25255B2%25255D.png" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div>
<div style="text-align: justify;">
In a typical Informatica <a href="http://www.disoln.org/2012/09/understand-informatica-powercenter-Workflow-Designer.html">PowerCenter workflow</a> data is committed to the target table after a predefined number of rows are processed into target, which is specified in the session properties. But there are scenarios in which you need more control on the commits and rollbacks. In this article, lets see how we can achieve this using Transaction Control Transformation.<br />
<a name='more'></a></div>
<h2>
What is Transaction Control Transformation</h2>
<div style="text-align: justify;">
A transaction is the set of rows bound by commit or roll back rows. The Transaction Control Transformation lets you control the commit and rollback transactions based on an expression or logic defined in the <a href="http://www.disoln.org/2012/08/Understand-Informatica-PowerCenter-Mapping-Designer.html">mapping</a>. For example, you might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
</div>
<div class="p1" style="text-align: justify;">
When you run the session, the Integration Service evaluates the expression defined in the transformation for each row that enters the transformation. When it evaluates a commit row, it commits all rows in the transaction to the target. When the Integration Service evaluates a roll back row, it rolls back all rows in the transaction from the target. </div>
<div>
<h2>
Configuring Transaction Control Transformation</h2>
</div>
<div style="text-align: justify;">
Transaction Control Transformation can be created and used as any other active <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformations</a>. All the required properties to configure this transformation can be provided in the Properties tab as shown in below image.</div>
<a href="http://lh5.ggpht.com/-T_WU5l3_y3k/UuX_qrRxH1I/AAAAAAAAJJE/MXmNwHQb8f8/s1600-h/image%25255B31%25255D.png"><img alt="Transaction Control Transformation to Control Commit and Rollback Transactions" border="0" src="http://lh3.ggpht.com/-c90o1yL6U7w/UuX_sNutg4I/AAAAAAAAJJM/FEQKVVgVjj0/image_thumb%25255B25%25255D.png?imgmax=800" height="478" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="550" /></a>
<br />
<div style="text-align: justify;">
You can enter the transaction control expression in the Transaction Control Condition field. The transaction control expression uses the IIF function to test each row against the condition. The Integration Service evaluates the condition on a row-by-row basis. The return value determines whether the Integration Service commits, rolls back, or makes no transaction changes to the row. </div>
<div class="p1" style="text-align: justify;">
<br />
You can use the following built-in variables in the Expression Editor when you create a transaction control expression.</div>
<ul>
<li style="text-align: justify;"><b><span style="color: #cc0000;">TC_CONTINUE_TRANSACTION</span></b>. The Integration Service does not perform any transaction change for this row. This is the default value of the expression. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>TC_COMMIT_BEFORE</b></span>. The Integration Service commits the transaction, begins a new transaction. The current row is in the new transaction. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>TC_COMMIT_AFTER</b></span>. The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>TC_ROLLBACK_BEFORE</b></span>. The Integration Service rolls back the current transaction, begins a new transaction. The current row is in the new transaction. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>TC_ROLLBACK_AFTER</b></span>. The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction.</li>
</ul>
<h2>
Transaction Control Transformation Use Case</h2>
</div>
<div style="text-align: justify;">
Lets consider an ETL Job loading data into an OLTP application. The application data is being accessed by the system real time. This means the data loaded into the target table should confirm the consistency and integrity. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To be more specific about the use case, Sales order data loaded into the OLTP Application target table need to be committed after all the order items in a sales order is loaded into the target table.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b style="font-weight: bold;"><span style="color: #cc0000;">Solution</span></b><b> : </b>Here lets create a Transaction Control Transformation, which is connected in the mapping pipeline after all the ETL logic is complete. The logic to define the commit points can be provided in the Transaction Control Transformation.</div>
<div style="font-weight: bold; text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b>Step 1 :- </b>Once the required <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformation</a> logic is build in the mapping, you create create a <a href="http://www.disoln.org/2013/02/Working-with-Aggregator-and-Sorter-Transformation.html">sorter transformation</a> to group all the order items with in a sales order together based on ORDER_ID as shown in below.</div>
<div style="text-align: justify;">
<a href="http://lh5.ggpht.com/-WytTo7Jmucc/UunsXz1IrXI/AAAAAAAAJJc/rEvmTSJD3Xo/s1600-h/image%25255B12%25255D.png"><img alt="Transaction Control Transformation to Control Commit and Rollback in Your ETL" border="0" src="http://lh5.ggpht.com/-_oFen7wRq38/UunsZat-OSI/AAAAAAAAJJk/SZKV-es_hHQ/image_thumb%25255B10%25255D.png?imgmax=800" height="104" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="400" /></a>
</div>
<div>
<b>Step 2 :- </b>Create an expression transformation and add new ports with below expression. This step will let you identify, when all records in an order is complete processing.</div>
<ul>
<li><i>V_NEXT_ORDER_FLAG (Variable) :- IIF(ORDER_ID = V_PRIOR_ORDER_ID, 'N', 'Y')</i></li>
<li><i>V_PRIOR_ORDER (Variable) :- ORDER_ID</i></li>
<li><i>NEXT_ORDER_FLAG (Output) :- V_NEXT_ORDER_FLAG</i></li>
</ul>
<div>
<a href="http://lh3.ggpht.com/-JFj1m95fhWg/UunsalIXHYI/AAAAAAAAJJs/2V9NXoY4qEI/s1600-h/image%25255B31%25255D.png"><img alt="Transaction Control Transformation to Control Commit and Rollback in Your ETL" border="0" src="http://lh3.ggpht.com/-2Bs7g3pwcNs/Uunsb32-vkI/AAAAAAAAJJ0/fz0o8u5MT58/image_thumb%25255B25%25255D.png?imgmax=800" height="172" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="585" /></a><b><span style="color: #cc0000;">Hint</span></b> :- This variable port technique can be used to <a href="http://www.disoln.org/2012/06/retain-values-from-previously-processed.html">preserve the value from a prior record</a>.</div>
<div>
<br />
<div style="text-align: justify;">
<b>Step 3 :- </b>Now you can create the Transaction Control Transformation like any other active <a href="http://www.disoln.org/search/label/Transformations?max-results=8">transformation</a> and connect to the upstream transformation as shown below. Provide the expression to define the commit logic, below given is the expression per our use case.</div>
<div class="p1">
</div>
<ul>
<li style="text-align: left;"><i><b>IIF</b><span class="s1">(NEXT_ORDER_FLAG = </span><span class="s2">'N'</span><span class="s1">,</span><b>TC_CONTINUE_TRANSACTION</b><span class="s1">,</span><b>TC_COMMIT_BEFORE</b><span class="s1">)</span></i></li>
</ul>
</div>
<div style="text-align: justify;">
<a href="http://lh3.ggpht.com/-P4mkwhAx3KQ/UunsdP1BypI/AAAAAAAAJJ8/GAQohY9zUAI/s1600-h/image%25255B50%25255D.png"><img alt="Transaction Control Transformation to Control Commit and Rollback in Your ETL" border="0" src="http://lh4.ggpht.com/-c73nVcsjZYo/UunseqsE-vI/AAAAAAAAJKE/yV_hCghOYAI/image_thumb%25255B44%25255D.png?imgmax=800" height="480" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="518" /></a><b>Step 4 :- </b>Now you connect all the ports from Transaction Control transformation to the target definition.
<a href="http://lh4.ggpht.com/-z8Ha83pLf7Q/Uunsfo88KfI/AAAAAAAAJKM/_5wPkJpOe74/s1600-h/image%25255B64%25255D.png"><img alt="Transaction Control Transformation to Control Commit and Rollback in Your ETL" border="0" src="http://lh4.ggpht.com/-HFIcsSN2FV0/Uunsg93PQ1I/AAAAAAAAJKU/288rWrsg5bM/image_thumb%25255B56%25255D.png?imgmax=800" height="162" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="557" /></a><br />
<b><span style="color: #cc0000;">Note :-</span></b> While <a href="http://www.disoln.org/2012/09/understand-informatica-powercenter-Workflow-Designer.html">configuring the session</a>, be sure to set the "Commit Type" Property as "User Defined"<br />
<br /></div>
<div style="text-align: justify;">
Hope this tutorial was useful for your project. Please leave you questions and commends, We will be more than happy to help you.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-31707374394518946782014-01-20T18:20:00.001-08:002014-01-22T07:16:04.747-08:00Informatica PowerCenter Design Best Practices and Guidelines<img align="left" alt="Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts" border="0" src="http://lh5.ggpht.com/-M98bllrvHRg/Ut2EimGFauI/AAAAAAAAJIo/MzwDtH9NTdw/modularity_thumb%25255B5%25255D.jpg" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div>
<div style="text-align: justify;">
A high-level systematic ETL design will help to build efficient and flexible ETL processes. So special care should be given in the design phase of your project. In following we will be covering the key points one should keep in mind while designing an ETL process. The following recommendations can be integrated into your ETL design and development processes to simplify the effort and improve the overall quality of the finished product.<br />
<a name='more'></a></div>
<ul><ul><ul><ol>
<li><b><span style="color: #cc0000;">Consistency</span></b></li>
<li><b><span style="color: #cc0000;">Modularity</span></b></li>
<li><b><span style="color: #cc0000;">Reusability</span></b></li>
<li><b><span style="color: #cc0000;">Scalability</span></b></li>
<li><b><span style="color: #cc0000;">Simplicity</span></b></li>
</ol>
</ul>
</ul>
</ul>
<div>
<h2>
1. Consistency</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" src="http://lh4.ggpht.com/-1WiLy_-fgio/Ut2Eju8cI6I/AAAAAAAAJIQ/0iPj3YcKLJ8/Consistency_thumb%25255B5%25255D.jpg" height="50" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
</div>
<div style="text-align: justify;">
To ensure consistency and facilitate easy maintenance post production it is important to define and agree on development standards before development work has begun.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The standards will define the ground rules for the development team. Standards can range in items from naming conventions to documentation standards to error handling standards. Development work should adhere to these standards throughout the life cycle and new team members will be able to reference these standards to understand the requirements placed upon the design and build activities<br />
<br />
Applying consistent standards such as naming conventions, <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">design patterns</a>, <a href="http://www.disoln.org/2012/10/User-Defined-Error-Handling-in-Informatica-PowerCenter.html">error handling</a>, <a href="http://www.disoln.org/2012/10/An-ETL-Framework-for-Change-Data-Capture-CDC.html">change data capture</a> reduces long term complications and makes maintenance easy. </div>
<h2>
2. Modularity</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" src="http://lh5.ggpht.com/-M98bllrvHRg/Ut2EimGFauI/AAAAAAAAJIo/MzwDtH9NTdw/modularity_thumb%25255B5%25255D.jpg" height="50" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div>
<div style="text-align: justify;">
A modular design is important for an efficient ETL design. Divide different components of your ETL process such as <a href="http://www.disoln.org/2012/10/change-data-capture-cdc-made-easy-using-mapping-variables.html">incremental data pull logic</a>, <span style="text-align: justify;"> </span><a href="http://www.disoln.org/2012/10/User-Defined-Error-Handling-in-Informatica-PowerCenter.html" style="text-align: justify;">error handling</a><span style="text-align: justify;">, </span><a href="http://www.disoln.org/2012/10/An-ETL-Framework-for-Change-Data-Capture-CDC.html" style="text-align: justify;">change data capture</a>, <a href="http://www.disoln.org/2012/09/An-ETL-Framework-for-Operational-Metadata-logging.html">operational meta data logging</a> into different modules. This makes the ETL processes efficient, scalable, and maintainable. </div>
</div>
<ul><ul>
</ul>
</ul>
</div>
<div>
<h2>
3. Reusability</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://lh4.googleusercontent.com/-lIgXp8xP9pw/UIsmMx0FaXI/AAAAAAAAF6k/ZOmB8Src3u8/h120/recycle_full.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
<a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">Reusability</a> is a great feature in Informatica PowerCenter which can be used by developers. Its general purpose is to reduce unnecessary coding which ultimately reduces development time and increases supportability. In addition to that, it also help to react quickly to potential changes required for a program.<br />
<br />
A great focus should be given during the design phase on reuse to make quick and universal modifications. Informatica PowerCenter has provided a variety of methods to achieve <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">reusability</a> such as Mapplets, Worklets, Reusable Transformations, Reusable functions, Parameters, Shared Folders.</div>
<h2>
4. Scalability</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://lh4.googleusercontent.com/-IJRK85f-M1U/Ut19My0TOrI/AAAAAAAAJHc/_71CY2Ich1M/h120/scalability.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
Keep volumes in mind in order to create efficient ETL process. Estimating the data volume requirements of a data integration project is a critical. Based on the volume estimates special consideration need to be given for caching different transformations, running complex queries, applying different <a href="http://www.disoln.org/2013/11/Informatica-Performance-Tuning-Guide-Performance-Enhancement-Features-Part-4.html">performance turning techniques</a>, such as <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Pushdown-Optimization-an-ELT-Approach.html">push down optimization</a>, <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html">Session Partitioning</a>, <a href="http://www.disoln.org/2013/08/Dynamic-Partitioning-to-Increase-Parallelism-Based-on-Resources-Availability.html">Dynamic Session Partition</a>, <a href="http://www.disoln.org/2012/11/Informatica-Concurrent-Workflows-to-Reduce-Warehouse-ETL-Load-Time.html">Concurrent Workflows</a>, <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html">Grid Deployments</a>, <a href="http://www.disoln.org/2013/11/Informatica-PowerCenter-Load-Balancing-for-Workload-Distribution.html">Workflow Load Balancing</a> and <a href="http://www.disoln.org/search/label/Performance%20Tips?max-results=15">Other available Performance Tips</a>.</div>
<ul><ul>
</ul>
</ul>
<h2>
5. Simplicity</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://lh5.googleusercontent.com/-tipwtxL9m6I/Ut17zI9_twI/AAAAAAAAJHQ/RlBIbTcLcWM/h120/simplicity.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
It is recommended to create multiple simple ETL Process, <a href="http://www.disoln.org/2012/08/Understand-Informatica-PowerCenter-Mapping-Designer.html">Informatica Mappings</a> and <a href="http://www.disoln.org/2012/09/understand-informatica-powercenter-Workflow-Designer.html">Informatica Workflows</a> instead of few complex ones. Use Staging Area and try to keep the processing logic as clear and simple as possible. Such design makes develop, debug, maintain easy compared to complex ETL logic.<br />
<br /></div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-27832130934234151422013-12-29T23:48:00.003-08:002014-01-05T11:20:18.501-08:00Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts<img align="left" alt="Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts" border="0" height="100" src="https://lh3.googleusercontent.com/qBAXFBDjrFYD0PySkt1xRBRz-Ouqya1etHiUHDvOK-Q=s150-p-no" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
In the typical case for a data warehouse, <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">dimensions</a> are processed first and the facts are loaded later, with the assumption that all required <a href="http://www.disoln.org/2013/01/slowly-changing-dimension-type-1-implementation-using-informatica-powercenter.html">dimension</a> data is already in place. This may not be true in all cases because of nature of your business process or the source application behavior. Fact data also, can be sent from the source application to the warehouse way later than the actual fact data is created. In this article lets discusses several options for handling late arriving <a href="http://www.disoln.org/2013/04/SCD-Type-6-Implementation-using-Informatica-PowerCenter.html">dimension</a> and Facts.<br />
<a name='more'></a></div>
<h2>
What is Late Arriving Dimension</h2>
<div style="text-align: justify;">
Late arriving <a href="http://www.disoln.org/2013/01/slowly-changing-dimension-type-3-implementation-using-informatica-powercenter.html">dimensions</a> or sometimes called early-arriving facts occur when you have <a href="http://www.disoln.org/2013/04/SCD-Type-4-a-solution-for-Rapidly-Changing-Dimension.html">dimension</a> data arriving in the data warehouse later than the fact data that references that <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">dimension</a> record.<br />
<br />
For example, an employee availing medical insurance through his employer is eligible for insurance coverage from the first day of employment. But the employer may not provide the medical insurance information to the insurance provider for several weeks. If the employee undergo any medical treatment during this time, his medical claim records will come as fact records with out having the corresponding patient dimension details.</div>
<h2>
Design Approaches</h2>
<div>
<div style="text-align: justify;">
Depending on the business scenario and the type of dimension in use, we can take different design approaches.</div>
<ol>
<ol><ul>
<li><span style="color: #cc0000;"><b>Hold the Fact record until Dimension record is available.</b></span></li>
<li><span style="color: #cc0000;"><b>'Unknown' or default Dimension record.</b></span></li>
<li><span style="color: #cc0000;"><b>Inferring the Dimension record.</b></span></li>
<li><span style="color: #cc0000;"><b>Late Arriving Dimension and SCD Type 2 changes.</b></span></li>
</ul>
</ol>
</ol>
</div>
<div>
<h3>
1. Hold the Fact record until Dimension record is available</h3>
<div style="text-align: justify;">
One approach is to place the fact row in a suspense table. The fact row will be held in the suspense table until the associated dimension record has been processed. This solution is relatively easy to implement, but the primary drawback is that the fact row isn’t available for reporting until the associated dimension record has been handled.<br />
<br />
This approach is more suitable when your data warehouse is refreshed as a scheduled batch process and a delay in loading fact records until the dimension records are available is acceptable for the business.</div>
</div>
<a href="http://lh5.ggpht.com/-bK80cY1I00Q/UsCPbAagh7I/AAAAAAAAJCw/v5pYuuRz1mo/s1600-h/Late%252520Arriving%252520Dim%2525201%25255B12%25255D.png"><img alt="Late Arriving Dimension design approach" border="0" src="http://lh5.ggpht.com/-0TC_watoaDo/UsCPcu12fgI/AAAAAAAAJC4/6XGfoGVJdbk/Late%252520Arriving%252520Dim%2525201_thumb%25255B10%25255D.png?imgmax=800" height="690" style="border: 0px; display: block; float: none; margin: 4px auto -20px;" title="" width="530" /></a>
<br />
<h3>
2. 'Unknown' or default Dimension record</h3>
<div>
<div style="text-align: justify;">
<div style="text-align: justify;">
Another approach is to simply assign the “Unknown” dimension member to the fact record. On the positive side, this approach does allow the fact record to be recorded during the ETL process. But it won’t be associated with the correct dimension value. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The "Unknown" fact records can also be kept into a suspense table. Eventually, when the Dimension data is processed, the suspense data can be reprocessed and associate with a real, valid Dimension record.</div>
</div>
<a href="http://lh6.ggpht.com/-6ZgEfZIX00s/UsCPd4G-reI/AAAAAAAAJDA/I5nr8b15mHs/s1600-h/Late%252520Arriving%252520Dim%2525202%25255B12%25255D.png"><img alt="Late Arriving Dimension design approach" border="0" src="http://lh6.ggpht.com/-w7RihEGhTpo/UsCPfZ_4d7I/AAAAAAAAJDI/2GReW1GbA98/Late%252520Arriving%252520Dim%2525202_thumb%25255B10%25255D.png?imgmax=800" height="768" style="border: 0px; display: block; float: none; margin: 4px auto -20px;" title="" width="656" /></a>
<br />
<h3>
3. Inferring the Dimension record</h3>
<div style="text-align: justify;">
Another method is to insert a new Dimension record with a new <a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html">surrogate key</a> and use the same <a href="http://www.disoln.org/2013/11/Surrogate-Key-Generation-Approaches-Using-Informatica-PowerCenter.html">surrogate key</a> to load the incoming fact record. <span style="text-align: justify;">This only works if you have enough details about the dimension in the fact record to construct the natural key. Without this, you would never be able to go back and update this dimension row with complete attributes.</span></div>
<div style="text-align: justify;">
<span style="text-align: justify;"><br /></span></div>
<div style="text-align: justify;">
<span style="text-align: justify;">In the insurance claim example explained in the beginning; it is almost certain that the "patient id" will be part of the claim fact, which is the natural key of the patient dimension. So we can create a new placeholder dimension record for the patient with a new </span><a href="http://www.disoln.org/2013/11/Surrogate-Key-Generation-Approaches-Using-Informatica-PowerCenter.html">surrogate key</a><span style="text-align: justify;"> and the natural key "patient id".</span></div>
<a href="http://lh4.ggpht.com/-xVy90P9EUX0/UsCPgvH1DhI/AAAAAAAAJDQ/meCap6om23g/s1600-h/Late%252520Arriving%252520Dim%2525203%25255B7%25255D.png"><img alt="Late Arriving Dimension design approach" border="0" src="http://lh3.ggpht.com/-P_-CVdRc3Zo/UsCPh6d_ZwI/AAAAAAAAJDY/tJ22DLKGbNI/Late%252520Arriving%252520Dim%2525203_thumb%25255B5%25255D.png?imgmax=800" height="695" style="border: 0px; display: block; float: none; margin: 4px auto -20px;" title="" width="451" /></a>
<br />
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>Note</b></span> : When you get all other attributes for the patient dimension record in a later point, you will have to do a <a href="http://www.disoln.org/2013/01/slowly-changing-dimension-type-1-implementation-using-informatica-powercenter.html">SCD Type 1</a> update for the first time and <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">SCD Type 2</a> going forward.</div>
<h3>
4. Late Arriving Dimension and SCD Type 2 changes</h3>
</div>
<div style="text-align: justify;">
Late arriving dimension with <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html" style="text-align: justify;">SCD Type 2</a> changes gets more complex to handle.</div>
<div>
<h4>
4.1. Late Arriving Dimension with multiple <a href="http://www.disoln.org/2013/03/History-Building-Algorithm-for-Slowly-Changing-Dimensions.html">historical</a> changes</h4>
<div style="text-align: justify;">
As described above, we can handle late arriving dimension by keeping an "Unknown" dimension record or an "Inferred" dimension record, which acts an a placeholder.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Even before we get the full dimension record details from the source system, there may be multiple <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">SCD Type 2</a> changes to the placeholder dimension record. This leads to the creation of new dimension record with new surrogate key and modify any <a href="http://www.disoln.org/2013/03/Re-Keying-Surrogate-Key-For-Dimension-Tables-The-Need-Impact-Fix.html">subsequent fact records</a> surrogate key to point the new surrogate key.<br />
<br />
Lets see the scenario in detail with the help medical insurance claim example.<br />
<br />
The patient with ID 67223 have made two insurance claims. One on 9/10 and other on 9/20. As there is no patient dimension information is available for patient id 67223 yet, an 'Inferred' dimension record is created for the patient with <a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html">surrogate key</a> 1001.<br />
<br />
Below shown is the state of the dimension and the fact table at this point.</div>
<a href="http://lh5.ggpht.com/-TZjkZ64o9es/Useu1dSocNI/AAAAAAAAJEw/VPKntERZQ1w/s1600-h/image%25255B72%25255D.png"><img alt="Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts" border="0" src="http://lh4.ggpht.com/-ExRBnKCQgA4/Useu1wCRIdI/AAAAAAAAJE8/03YQfZNOZRc/image_thumb%25255B60%25255D.png?imgmax=800" height="152" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="735" /></a>
<br />
<div style="text-align: justify;">
Later, by the time dimension information is made available, there has already been <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html" style="text-align: justify;">SCD Type 2</a> changes for the patient id 67223. There has been changes for the patient id 67223 on 9/10 and again on 9/12. Below shown is the current state of the dimension and fact records. The fact record created on 9/20 is still referring to <a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html">surrogate key</a> 1001, which is not the correct representation.<br />
<a href="http://lh6.ggpht.com/-naQdbEKlrYU/UsjY-MkxOaI/AAAAAAAAJGg/V3OiGo3kY7Q/s1600-h/image%25255B100%25255D.png" style="text-align: start;"><img alt="image" border="0" src="http://lh4.ggpht.com/-vnvXz_ELMT4/UsjY-uPRS7I/AAAAAAAAJGo/8tb-LEIIw7E/image_thumb%25255B84%25255D.png?imgmax=800" height="176" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="image" width="733" /></a></div>
<span style="text-align: justify;"><br /></span>
<span style="text-align: justify;">This means the claim record created on 9/20 need to be reassigned to the correct </span><span style="text-align: justify;"> </span><a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html" style="text-align: justify;">surrogate key</a>, which is active for the same time period. Below shown is the correct state of the dimension and fact records.<br />
<a href="http://lh3.ggpht.com/-rt7ujMm9lG8/UsjY_IuqNjI/AAAAAAAAJGw/DJIFOTwZpX4/s1600-h/image%25255B97%25255D.png"><img alt="image" border="0" src="http://lh5.ggpht.com/-yKvgsRaot0w/UsjY_pVOqqI/AAAAAAAAJG0/0nRzccIaEIw/image_thumb%25255B81%25255D.png?imgmax=800" height="173" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="image" width="732" /></a>
<div>
<h4>
4.2. Late Arriving Dimension with retro effective changes</h4>
<div>
<div style="text-align: justify;">
You can get Dimension records from source system with retro effective dates. For example you might update your marital status in your HR system way later than your marriage date. This update come to data warehouse with retro effective date.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
This leads to a new dimension record with a new surrogate key and changes in effective dates for the affected dimension. You will have to scan forward in the dimension to see if there is any subsequent type 2 rows for this dimension. This further leads in <span style="text-align: justify;">modify any subsequent fact records surrogate key to point the new surrogate key.</span></div>
<div style="text-align: justify;">
<span style="text-align: justify;"><br /></span></div>
Lets again use the medical insurance claim example for our explanation.</div>
<div>
<br /></div>
<div style="text-align: justify;">
Below shown state of the Patient Dimension and the Claim Fact table at this point, which is perfectly good.</div>
<a href="http://lh3.ggpht.com/-5IYbqx97zac/Useu3DEDx5I/AAAAAAAAJFQ/uU4VzyCm4yQ/s1600-h/image%25255B70%25255D.png"><img alt="Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts" border="0" src="http://lh3.ggpht.com/-82L9t70N-iQ/Useu3S-OUHI/AAAAAAAAJFY/bhnC8fv-pgE/image_thumb%25255B58%25255D.png?imgmax=800" height="199" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="735" /></a><br />
<div style="text-align: justify;">
Now we have got a Patient Dimension data from the source system say on 10/1, which is in effective from 9/15 as shown below.</div>
<a href="http://lh3.ggpht.com/-PolvI-gA9Ak/Useu3-2oiyI/AAAAAAAAJFg/0AfZq8ZRp0s/s1600-h/image%25255B69%25255D.png"><img alt="Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts" border="0" src="http://lh5.ggpht.com/-K_xiZ3ysxyo/Useu4Jcd9nI/AAAAAAAAJFo/U8gzHv9r6mg/image_thumb%25255B57%25255D.png?imgmax=800" height="62" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="734" /></a> <br />
<div style="text-align: justify;">
This new Dimension data which comes with a retro effective date makes all dimension records out of sync in terms of the effective start and end date. In addition to that, the fact records are referring to incorrect dimension records.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So in addition to inserting a new dimension record with a new <a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html">surrogate key</a>, we will have to adjust the effective dates of the prior period dimension record and propagate the dimension column value changes to the remaining records. The fact table also need to be updated to reassign the correct <a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html">surrogate key</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Below shown red is the corrections required to take care of the retro effective dimension records. </div>
<a href="http://lh5.ggpht.com/-0qljJHn8XLk/Useu4jwPSFI/AAAAAAAAJFw/AzqLoRosd2E/s1600-h/image%25255B66%25255D.png"><img alt="Design Approach to Handle Late Arriving Dimensions and Late Arriving Facts" border="0" src="http://lh5.ggpht.com/-YXEk5uxjOm8/Useu44q65AI/AAAAAAAAJF8/To9GzMWk13w/image_thumb%25255B54%25255D.png?imgmax=800" height="213" style="border: 0px; display: block; float: none; margin: 5px auto 10px;" title="" width="736" /></a>
<h2 style="text-align: start;">
What is Late Arriving Facts</h2>
<div style="text-align: justify;">
Late arriving fact scenario occurs when the transaction or fact data comes to data warehouse way later than the actual transaction occurred in the source application. If the late arriving fact need to be associated with an <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html">SCD Type 2</a> dimension, the situation become messy. This is because we have to search back in <a href="http://www.disoln.org/2013/03/History-Building-Algorithm-for-Slowly-Changing-Dimensions.html">history</a> within the dimensions to decide how to assign the right dimension keys that were in effect when the activity occurred in the past.</div>
<div style="text-align: justify;">
<h2 style="text-align: start;">
Design Approaches</h2>
<div>
Unlike late arriving dimensions, late arriving fact records can be handles relatively easily. When loading the fact record, the associated dimension table <a href="http://www.disoln.org/2013/03/History-Building-Algorithm-for-Slowly-Changing-Dimensions.html">history</a> has to be searched to find out the appropriate surrogate key which is effective at the time of the transaction occurrences. Below data flow describes the late arriving fact design approach.</div>
</div>
</div>
<a href="http://lh3.ggpht.com/-zG0pJJ_NyrU/UsCYQ5ILfMI/AAAAAAAAJEM/J087WgK0DGw/s1600-h/Late%252520Arriving%252520Fact%2525201%25255B25%25255D.png"><img alt="Late Arriving Fact design approach" border="0" src="http://lh3.ggpht.com/-Lqk_q341D7U/UsCYSOYOB7I/AAAAAAAAJEU/shz_ml6On8M/Late%252520Arriving%252520Fact%2525201_thumb%25255B21%25255D.png?imgmax=800" height="508" style="border: 0px; display: block; float: none; margin: 4px auto 10px;" title="" width="598" /></a>
<br />
<div style="text-align: justify;">
Hope you guys enjoyed this article and gave you some new insights into late arriving dimension and fact scenarios in Data Warehouse. Leave us your questions and commends. We would also like to hear how you have handled late arriving dimension and fact in your data warehouse.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-83770130371250057242013-12-08T18:16:00.001-08:002014-06-14T20:56:39.041-07:00SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="100" src="http://lh3.ggpht.com/-LoWiDD8QSl8/UqUaDw0CgiI/AAAAAAAAJA0/-PJBuwSkYqc/image_thumb%25255B93%25255D.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
In our couple of prior articles we spoke about <a href="http://www.disoln.org/2012/12/Change-Data-Capture-CDC-Implementation-Using-CHECKSUM-Number.html">change data capture</a>, different <a href="http://www.disoln.org/2012/10/change-data-capture-cdc-made-easy-using-mapping-variables.html">techniques to capture change data</a> and a <a href="http://www.disoln.org/2012/10/An-ETL-Framework-for-Change-Data-Capture-CDC.html">change data capture frame work</a> as well. In this article we will deep dive into different aspects for change data in Data Warehouse including soft and hard deletions in source systems.<br />
<a name='more'></a></div>
<h2>
Revisiting Change Data Capture (CDC)</h2>
<div style="text-align: justify;">
When we talk about <span style="color: #cc0000;">Change Data Capture (CDC)</span> in DW, we mean to capture those changes that have happened at the source side so far after we have run our job last time. In Informatica we call our ETL code as ‘Mapping’, because we MAP the source data (OLTP) into the target data (DW) and the purpose of running the ETL codes is to keep the source and target data in sync, along with some transformations in between, as per the business rules.<br />
<a href="http://lh5.ggpht.com/-DktAzQHKc5I/UqUZ5dkOIfI/AAAAAAAAI-g/GFxlEwdA_8s/s1600-h/image%25255B92%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" height="175" src="http://lh5.ggpht.com/-bE8e2-Oa3_4/UqUZ5zqgVeI/AAAAAAAAI-o/cupA2y8Nf4E/image_thumb%25255B74%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="585" /></a></div>
Now, data may get changed at source in three different ways.<br />
<ul>
<li><span style="color: #cc0000;"><i>NEW</i></span> transactions happened at source.</li>
<li><span style="color: #cc0000;"><i>CORRECTIONS</i></span> happened on old transactional values or measured values.</li>
<li><span style="color: #cc0000;"><i>INVALID</i></span> transactions removed from source.</li>
</ul>
<div style="text-align: justify;">
Usually in our ETL we take care of the 1st and 2nd case(Insert/Update Logic); the 3rd change is not captured in DW unless it is specifically instructed in the requirement specification. But when it’s especially amended, we need to devise convenient ways to track the transactions that were removed i.e., to track the deleted records at source and accordingly DELETE those records in DW.</div>
<div>
<br />
<div style="text-align: justify;">
One thing to make clear is that <span style="color: #cc0000;">Purging</span> might be enabled at your OLTP, i.e OLTP keeping data for a fixed historical period of time, but that is a different scenario. Here we are more interested about what was DELETED at Source because the transactions was NOT valid.</div>
<h2>
Effects in DW for Source Data Deletion</h2>
</div>
DW tables can be divided into three categories as related to the deleted source data.<br />
<ol>
<li style="text-align: justify;">When the DW table load nature is '<i><span style="color: #cc0000;">Truncate & Load</span></i>' or '<span style="color: #cc0000;"><i>Delete & Reload</i></span>', we don't have any impact, since the requirement is to keep the exact snapshot of the source table at any point of time.</li>
<li style="text-align: justify;">When the DW table <i><span style="color: #cc0000;">does not track history on data changes</span></i> and deletes are allowed against the source table. If a record is deleted in the source table, it is also deleted in the DW.</li>
<li style="text-align: justify;">When the DW table <i><span style="color: #cc0000;">tracks history on data changes</span></i> and deletes are allowed against the source table. The DW table will retain the record that has been deleted in the source system, but this record will be either expired in DW based on the change captured date or 'Soft Delete' will be applied against it.</li>
</ol>
<h2>
Types of Data Deletion</h2>
<div style="text-align: justify;">
Academically, deleting records from DW table is forbidden, however, it’s a common practice in most DWs when we face this kind of situations. Again, if we are deleting records from DW, it has to be done after proper discussions with Business. If your Business requires DELETION, then there are two ways.</div>
<div style="text-align: justify;">
<ul>
<li><b><span style="color: #cc0000;">Logical Delete</span></b> :- In this case, we have a specific flag in the source table as STATUS which would be having the values as ‘ACTIVE’ or ‘INACTIVE’. Some OLTPs keep the field name as ACTIVE with the values as ‘I’, ‘U’ or ‘D’, where ‘D’ means that the record is deleted or the record is INACTIVE. This approach is quite safe and also known as <b><span style="color: #3d85c6;">Soft DELETE</span></b>.</li>
</ul>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://lh6.ggpht.com/-9vbcriJm-v8/UqUZ6b_T23I/AAAAAAAAI-s/k03a5SG-guI/s1600-h/image%25255B91%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" height="195" src="http://lh6.ggpht.com/-sESvlDwXHD4/UqUZ62gFiGI/AAAAAAAAI-4/KEyqaoH2Zn8/image_thumb%25255B73%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="251" /></a></div>
<div>
<ul>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Physical Delete</span></b> :- In this case the record related to invalid transactions are fully deleted from the source table by issuing DML statement. This is usually done after thorough discussing with Business Users and related business rules are strictly followed. This is also known as <b><span style="color: red;">Hard DELETE</span></b>.</li>
</ul>
<div>
<h2>
ETL Perspective on Deletion</h2>
</div>
</div>
<div style="text-align: justify;">
<span id="docs-internal-guid-7fd4b312-cc18-bc74-5ccb-5cf878e20772">When we have ‘<span style="color: #cc0000;">Soft DELETE</span>’ implemented at the source side, it becomes very easy to track the invalid transactions and we can tag those transactions in DW accordingly. We just need to filter the records from source using that STATUS field and issue an UPDATE in DW for the corresponding records. Few things to be kept in mind in this case.</span></div>
<div>
<br />
<div style="text-align: justify;">
If only ACTIVE records are supposed to be used in ETL processing, we need to add specific filters while fetching source data.</div>
</div>
<div>
<br />
<div style="text-align: justify;">
Sometimes INACTIVE records are pulled into the DW and moved till the ETL Data Warehouse level. While pushing the data into Exploration Data Warehouse, only the ACTIVE records are sent for reporting purpose.</div>
<span style="vertical-align: baseline;"></span></div>
<div style="text-align: justify;">
<br /></div>
<div>
For ‘<span style="color: #cc0000;">Hard DELETE</span>’, if Audit Table is maintained at source systems for what are transactions were deleted, we can source the same, i.e. join the Audit table and the Source table based on NK and logically delete them in DW too. But it becomes quite cumbersome and costly when no account is kept of what was deleted at all. In these cases, we need to use different ways to track them and update the corresponding records in DW.<br />
<h2>
Deletion in Data Warehouse : Dimension Vs Fact</h2>
</div>
<div style="text-align: justify;">
In most of the cases, we see only the transactional records to be deleted from source systems. DELETION of Data Warehouse records are a rare scenario.<br />
<h3>
Deletion in Dimension Tables</h3>
</div>
<div style="text-align: justify;">
If we have DELETION enabled for Dimensions in DW, it's always safe to keep a copy of the OLD record in some AUDIT table, as it helps to track any defects in future. A simple DELETE trigger should work fine; since DELETION hardly happens, this trigger would not degrade the performance much.<br />
<br /></div>
<div>
<div style="text-align: justify;">
Let's take this ORDERS table into consideration. Along with this, we can have a History table for ORDERS, e.g. ORDERS_Hist, which would store the DELETED records from ORDERS.</div>
<div style="text-align: center;">
<a href="http://lh6.ggpht.com/-VSVjiVHQcFo/UqUZ7b07aLI/AAAAAAAAI-8/PS7FeRyBs3U/s1600-h/image%25255B90%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" height="196" src="http://lh4.ggpht.com/-laWaszTxR8Q/UqUZ7uNAAvI/AAAAAAAAI_E/26R_jwt4laI/image_thumb%25255B72%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="484" /></a><span id="docs-internal-guid-7fd4b312-d015-d653-030b-63458815f055"><span style="font-family: Arial; font-size: 16px; vertical-align: baseline; white-space: pre-wrap;"> </span></span></div>
The below Trigger will work fine to achieve this.<br />
<div>
</div>
</div>
<div style="text-align: center;">
<a href="http://lh5.ggpht.com/-zlS_Mp-rY1Q/UqUZ8IwD62I/AAAAAAAAI_Q/3xkoyA-DMKI/s1600-h/image%25255B89%25255D.png" style="text-align: start;"><img alt="" border="0" height="253" src="http://lh3.ggpht.com/-x8v5Vcdbj6Y/UqUZ8qlPn7I/AAAAAAAAI_U/-cIscnCMsyA/image_thumb%25255B71%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="532" /></a></div>
<div style="text-align: justify;">
The AUDIT Fields will convey when this particular record was deleted and by which user. But this table needs to be created for each and every DW table where we want to keep the audit of what was DELETED. If the entire record is not need and only fields involved in Natural Key(NK) may work, we can have a consolidated table for all the Dimensions.<br />
<a href="http://lh5.ggpht.com/-GlAd4xbpLTI/UqUZ9OR79qI/AAAAAAAAI_g/2KtdrlDdjYk/s1600-h/image%25255B88%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" height="80" src="http://lh5.ggpht.com/-jCpgS7IuHhE/UqUZ-sfxPiI/AAAAAAAAI_o/rlZHjzkV_2o/image_thumb%25255B70%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="640" /></a></div>
<div style="text-align: justify;">
Here the Record_IDENTIFIER field contains the values of all the columns involved in the Natural Key(NK) separated by '#' of the table mentioned in the OBJECT_NAME field.</div>
<br />
<div style="text-align: justify;">
Sometimes, we face a situation in DW where a FACT table record contains a Surrogate Key(SK) from a Dimension but the Dimension table doesn't own it anymore. In those cases, the FACT table record becomes orphan and it will hardly be able to appear in any report since we always use the INNER JOIN between Dimensions and Fact while retrieving data in the reporting layer, and there it misses the <span style="color: #cc0000;">Referential Integrity(RI)</span>. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Suppose, we want to track the orphan records from the SALES Fact table in respect of Product Dimension. We can use the query as below.<br />
<a href="http://lh3.ggpht.com/-xEhcjXQo2qc/UqUZ_xMiu-I/AAAAAAAAI_s/nG0_H1IcPZ0/s1600-h/image%25255B51%25255D.png" style="text-align: start;"><img alt="" border="0" height="81" src="http://lh6.ggpht.com/-MtbdQAWcC_g/UqUaAU8Lo4I/AAAAAAAAI_0/2vKj2N9DIeE/image_thumb%25255B39%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="411" /></a></div>
<div style="text-align: justify;">
So, the above query will provide only the Orphan records, BUT certainly it cannot provide you the records DELETED from the PRODUCT_Dimension. So, one feasible solution could be while populating the EVENT table with the SKs from PRODUCT_Dimension that are being DELETED, provided we don't reuse our Surrogate Keys. So, when we have both the SKs and the NKs from the PRODUCT_Dimension in the EVENT table for DELETED entries, we can achieve a better compliance over the Data Warehouse data.</div>
<br />
<div style="text-align: justify;">
Another useful but least used approach is enabling the <a href="http://docs.oracle.com/cd/B28359_01/server.111/b28337/tdpsg_auditing.htm#TDPSG50000">audit</a> for any table for DELETE in an Oracle DB using queries like the following.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: center;">
<span style="color: #cc0000;">Audit DELETE on SCHEMA.TABLE;</span></div>
<br />
<div style="text-align: justify;">
The table <a href="http://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_3055.htm">DBA_AUDIT_STATEMENT</a> will contain all the related details related to this deletion, example the user who issued the, exact DML statement and so on, but this cannot provide you with the record that was deleted. Since this approach cannot directly provide you information on which record was deleted, it’s not so useful in our current discussion, so I would like to keep aloof from the topic here.</div>
<h3 style="text-align: justify;">
Deletion in Fact Tables</h3>
<div style="text-align: justify;">
Now, this was all about DELETION in DW Dimension tables. Regarding FACT data DELETION, I would like to cite an extract of what <a href="http://www.kimballgroup.com/">Ralph Kimball</a> has to say on Physical Deletion of Facts from DW.</div>
<br />
<div style="text-align: center;">
<img src="https://lh6.googleusercontent.com/T2HdSVlOGyJb86Pj5hnsXPrjYQFyr0HlylZ0P_CbSSokjH9kuOcTDplQLFite5nwGptfVEkB6dzEEcdJp5K_f0UeCjpyrNlCgO_pJvb1Sgi2J6k4DE6UDfAGotBRz2267Dc" /></div>
<div style="text-align: center;">
<img src="https://lh5.googleusercontent.com/szDCqQ8ppWBQoU5VTWiXD2Q0ceA8lY9WA-2sv4EeLDTZkh1jqJ3YvAhmVqzWTOUuLdeXb-wgxhtUQ6W42DGfqUoKo_dSX_DUeXiBbrYScGvdYpDXxfut4TZeWMbV6qAeZZo" /></div>
<h2>
Change Data Capture & Apply for 'Hard DELETE' in Source</h2>
<div style="text-align: justify;">
Again, whether we should track the DELETED records from source or not depends on the type of table and its Load Nature. I will share few genuine scenarios that are usually faced in any DW and discuss about the solutions accordingly.</div>
<h3>
1. Records are DELETED from SOURCE for a known Time Period, no Audit Trail was kept.</h3>
<div style="text-align: justify;">
In this case, the ideal solution is to DELETE the entire records’ set in DW for the Target table and pull the source records once again for the time period. This will bring the DW in sync with Source and DELETED records also will not be available in DW.<br />
<a href="http://lh4.ggpht.com/-zpgPts-69zg/UqUaAn2YWFI/AAAAAAAAI_8/CXcw8tUC_DM/s1600-h/image%25255B87%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" src="http://lh3.ggpht.com/-ftAc4LKl528/UqUaBPqFlII/AAAAAAAAJAI/JZ77l0b6ksA/image_thumb%25255B69%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" /></a></div>
<div style="text-align: justify;">
Usually time period is mentioned in terms of Ship_DATE or Invoice_DATE or Event_DATE, i.e. a DATE type field from the actual dataset of the source table is used, and hence the way we can filter the records for Extraction from source table using WHERE clause, we can do the same in DW table as well. </div>
<br />
<div style="text-align: justify;">
Obviously, in this case we are NOT able to capture the 'Hard DELETE' from the Source i.e., we cannot track the History of DATA, but we would be able to bring the Source and DW in sync at the least. Again, this approach is recommended only when the situation occurs once in a while and not on regular basis.</div>
<h3>
2. Records are DELETED from SOURCE on regular basis with NO Timeframe, no Audit Trail was kept.</h3>
<div style="text-align: justify;">
The possible solution in this case would be to implement <span style="color: #cc0000;">FULL Outer JOIN</span> between the Source and the Target table. The tables should be joined on the fields involved in the Natural Key(NK). This approach will help us to track all three kinds of changes to source data in one shot.</div>
<br />
The logic can be better explained with the help of a Venn diagram.<br />
<div style="text-align: center;">
<a href="http://lh4.ggpht.com/-rQBzV98olM0/UqUaBsaYesI/AAAAAAAAJAM/uaHAaFoxEHc/s1600-h/image%25255B71%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" height="449" src="http://lh5.ggpht.com/-wgyeLHSY5Hg/UqUaCJMbvFI/AAAAAAAAJAU/43GT6lyC6tg/image_thumb%25255B55%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" width="484" /></a></div>
<div style="text-align: justify;">
Out of the <span style="color: #cc0000;">Joiner</span> (kept in <span style="color: #cc0000;">FULL Outer Join</span> mode),</div>
<ul>
<li style="text-align: justify;">Records that have values for the NK fields only from the Source and not from the Target, they should go for the <span style="color: #cc0000;">INSERT</span> flow. These are all new records coming from source.</li>
<li style="text-align: justify;">Records that have values for the NK fields from both the Source and the Target, they should go for the <span style="color: #cc0000;">UPDATE</span> flow. These are already existing records of Source.</li>
<li style="text-align: justify;">Records that have values for the NK fields only from Target, will go for the <span style="color: #cc0000;">DELETE</span> flow. These are the records that were somehow DELETED from Source table.</li>
</ul>
<div style="text-align: justify;">
Now, what we do with those DELETED records from Source, i.e. apply 'Soft DELETE' or 'Hard DELETE' in DW, depends on our requirement specification and business scenarios.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
But this approach is having severe disadvantage in terms of ETL Performance. Whenever we go for a FULL Outer JOIN between Source and Target, we are using the entire data set from both the ends and this will obviously obstruct the smooth processing of ETL when data volume increases. </div>
<h3>
3. Records are DELETED from SOURCE, Audit Trail was kept.</h3>
<div style="text-align: justify;">
Even though I'm mentioning it a DELETION, it's NOT the kind of Physical DELETION that we discussed previously. This is mainly related to incorrect transactions in Legacy Systems, e.g. Mainframes, which usually send data in flat files. </div>
<div style="text-align: center;">
<a href="http://lh5.ggpht.com/-FxnyDdpbRDM/UqUaCsYZEZI/AAAAAAAAJAg/r3fRJRA_rvI/s1600-h/image%25255B86%25255D.png" style="text-align: start;"><img alt="SOFT and HARD Deleted Records and Change Data Capture in Data Warehouse" border="0" src="http://lh3.ggpht.com/-olLfj3Jmu-Y/UqUaDAd5ZyI/AAAAAAAAJAk/cz59b_AK5L8/image_thumb%25255B68%25255D.png" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="" /></a></div>
<div style="text-align: justify;">
When some old transactions become invalidated, source team sends those transactions related records again to DW but with inverted measures, i.e. the sales figure are same as the old ones but they are negative. So, DW contains both the old set of records and the newly arrived records, but the aggregated measures become NULL in the aggregated FACT table, thus diminishing the impact of those invalid transactions in DW to NULL.</div>
<div style="text-align: justify;">
<br /></div>
<div>
<div style="text-align: justify;">
Only disadvantage of this approach is, Aggregated FACT contains the correct data at the summarized level, but the transactional FACT dual set of records, which together
<br />
<div class="sticky taped" style="float: right;">
<span style="color: #cc0000; font-size: large;"><b><u>About the Author</u></b></span><br />
<script data-format="inline" data-id="http://www.linkedin.com/pub/debraj-ghosh/78/626/32a" data-related="false" type="IN/MemberProfile"></script></div>
represent the real scenario, i.e. at first the transaction happened(with the older record) and then it became invalid(with the newer record).</div>
<br />
Hope you guys enjoyed this article and gave you some new insights into change data capture in Data Warehouse. Leave us your questions and commends. We would like to hear how you have handled change data capture in your data warehouse.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-52043441299984927202013-11-30T22:25:00.002-08:002013-12-04T11:09:18.071-08:00Informatica Performance Tuning Guide, Performance Enhancements - Part 4<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgT-Bz9UOLmaQDeFf70lOjfhyleRQ50F7Y9hvSeCoDqebuwgkC0zP8ekTC52Iyh6qZJWAf0qLG90YojjtqCbVX05l0Q39md7zZh_3FR7F63D_uAlXD-nLmPGig5f-E1Xhfh7z7bCnf4VVk/s100/ist2_9047937-business-graph.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div>
<div style="text-align: justify;">
In our performance turning article series, so far we covered about the <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html">performance turning basics</a>, <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html">identification of bottlenecks</a> and <a href="http://www.disoln.org/2013/10/Informatica-Performance-Tuning-Guide-Resolve-Performance-Bottlenecks-Part-3.html">resolving different bottlenecks</a>. In this article we will cover different performance enhancement features available in Informatica PowerCener. In addition to the features provided by PowerCenter, we will go over the designs tips and tricks for ETL load performance improvement.<br />
<a name='more'></a></div>
<h2>
Performance Enhancements Features</h2>
The main PowerCenter features for <a href="http://www.disoln.org/search/label/Performance%20Tips?max-results=15">Performance</a> Enhancements are.<br />
<div class="sticky taped" style="background-position: initial initial; background-repeat: initial initial; float: right;">
<b>Performance Tuning Tutorial Series</b><br />
Part I : <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html">Performance Tuning Introduction.</a> <br />
Part II : <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html">Identify Performance Bottlenecks. </a><br />
Part III : <a href="http://www.disoln.org/2013/10/Informatica-Performance-Tuning-Guide-Resolve-Performance-Bottlenecks-Part-3.html">Remove Performance Bottlenecks</a>.<br />
Part IV : <a href="http://www.disoln.org/2013/11/Informatica-Performance-Tuning-Guide-Performance-Enhancement-Features-Part-4.html">Performance Enhancements</a>.</div>
<ol><ol>
<ol>
<li><span style="color: #cc0000;"><b>Pushdown Optimization.</b></span></li>
<li><span style="color: #cc0000;"><b>Pipeline Partitions.</b></span></li>
<li><span style="color: #cc0000;"><b>Dynamic Partitions.</b></span></li>
<li><span style="color: #cc0000;"><b>Concurrent Workflows.</b></span></li>
<li><span style="color: #cc0000;"><b>Grid Deployments.</b></span></li>
<li><span style="color: #cc0000;"><b>Workflow Load Balancing.</b></span></li>
<li><span style="color: #cc0000;"><b>Other Performance Tips and Tricks.</b></span></li>
</ol>
</ol>
</ol>
<h2>
1. Pushdown Optimization</h2>
<div>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://lh4.googleusercontent.com/-oWqH1Vu6Y8c/UfhTWX1WLbI/AAAAAAAAIXQ/Tr_tXp6p34c/h120/SQL+Push.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
Pushdown Optimization Option enables data transformation processing, to be pushed down into any relational database to make the best use of database processing power. It converts the transformation logic into SQL statements, which can directly execute on database. This minimizes the need of moving data between servers and utilizes the power of database engine.<br />
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about <i><a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Pushdown-Optimization-an-ELT-Approach.html">Pushdown Optimization</a>.</i></li>
</ul>
</ul>
</ul>
</div>
<h2>
2. Session Partitioning</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="http://3.bp.blogspot.com/-qZU-dQS-b6Y/UeRVLvoXaxI/AAAAAAAAIQM/AEpdWzqANTY/s1600/performance-icon.png?imgmax=800" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
The Informatica PowerCenter Partitioning Option increases the performance of PowerCenter through parallel data processing. Partitioning option will let you split the large data set into smaller subsets which can be processed in parallel to get a better session performance.<br />
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about <i><a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html">Session Partitioning</a>.</i></li>
</ul>
</ul>
</ul>
</div>
</div>
</div>
<h2>
3. Dynamic Session Partitioning</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://lh6.googleusercontent.com/-N4abF48GAwA/UgMymLyUHTI/AAAAAAAAIbI/OixSuSA6JKw/w216-h215-no/article_icon.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
Informatica PowerCenter <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-When-Where-and-How.html">session partition</a> can be used to <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html">process data in parallel</a> and achieve faster data delivery. Using Dynamic Session Partitioning capability, PowerCenter can dynamically decide the degree of parallelism. The Integration Service scales the number of session partitions at run time based on factors such as source database partitions or the number of CPUs on the node resulting significant <a href="http://www.disoln.org/search/label/Performance%20Tips?&max-results=15">performance improvement</a>.<br />
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about<i> <a href="http://www.disoln.org/2013/08/Dynamic-Partitioning-to-Increase-Parallelism-Based-on-Resources-Availability.html">Dynamic Session Partition</a>.</i></li>
</ul>
</ul>
</ul>
</div>
<h2>
4. Concurrent Workflows</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://lh6.googleusercontent.com/-7tya9CNUWUU/UKMjIrhALgI/AAAAAAAAGNw/mm4Zoxi_P9I/h120/parallels.ico?imgmax=800" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
A concurrent workflow is a workflow that can run as multiple instances concurrently. A workflow instance is a representation of a workflow. We can configure two types of concurrent workflows. It can be concurrent workflows with the same instance name or unique workflow instances to run concurrently.<br />
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about <i><a href="http://www.disoln.org/2012/11/Informatica-Concurrent-Workflows-to-Reduce-Warehouse-ETL-Load-Time.html">Concurrent Workflows</a>.</i></li>
</ul>
</ul>
</ul>
</div>
<h2>
5. Grid Deployments</h2>
<div>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="http://1.bp.blogspot.com/-QM8bcYmuCvk/UnMn3y9Tc6I/AAAAAAAAI1s/YiitbjEM1ZU/s100/grid.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div style="text-align: justify;">
When a PowerCenter domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.</div>
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about <i><a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html">Grid Deployments</a>.</i></li>
</ul>
</ul>
</ul>
</div>
<h2>
6. Workflow Load Balancing</h2>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMVUtnkr62yBuIBoc-f5xSIXxrE4Tzd-BhoDYWB5X1tMTLqUOCFWKwam0OYLS2JCbGd8xY4N6Po9dPb_V9O3aP-QgPXanBsE6fmGWyDsbknZvXdtbhvfWuCMKIZlq_59dzTwHgWzLe6Ek/s100/balancing.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div>
<div style="text-align: justify;">
Informatica Load Balancing is a mechanism which distributes the workloads across the nodes in the <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html">grid</a>. When you run a workflow, the Load Balancer dispatches different tasks in the workflow such as Session, Command, and predefined Event-Wait tasks to different <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html">nodes</a> running the Integration Service. Load Balancer matches task requirements with resource availability to identify the best node to run a task. It may dispatch tasks to a single node or across nodes on the <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html">grid</a>.</div>
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about <i><a href="http://www.disoln.org/2013/11/Informatica-PowerCenter-Load-Balancing-for-Workload-Distribution.html">Workflow Load Balancing</a>.</i></li>
</ul>
</ul>
</ul>
<h2>
7. Other Performance Tips and Tricks</h2>
</div>
<img align="left" alt="Informatica Performance Tuning Guide, Performance Enhancements - Part 4" border="0" height="50" src="http://1.bp.blogspot.com/-B4YJQtCaMCE/UINk37bw3uI/AAAAAAAAFvI/eOlbv2mc7GE/s200/DB-update.png?imgmax=800" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="50" />
<br />
<div>
Through out this blog we have been discussing different tips and tricks to improve your ETL load performance. We would like to reference those tips and tricks in this article for your reference.</div>
<div>
<ul>
<ul><ul>
<li><span style="color: #cc0000;"><b>Read More</b></span> about <i><a href="http://www.disoln.org/search/label/Performance%20Tips?max-results=15">Other Performance Tips and Tricks</a>.</i></li>
</ul>
</ul>
</ul>
<div style="text-align: justify;">
Hope you guys enjoyed these tips and tricks and it is helpful for your project needs. Leave us your questions and commends. We would like to hear any other performance tips you might have used in your projects.</div>
</div><div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-11554377147479349002013-11-21T22:14:00.004-08:002014-06-14T19:18:42.158-07:00Surrogate Key Generation Approaches Using Informatica PowerCenter<img align="left" alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNNd7DX1aJ8mERBg9VU4YRxV2W0DCvk05OU33EskE7Up05ehE6RGpR8j_KF3WCNFdPCYsiVNaLl10n9ustavcl1tD42QaZ4DNpr9LoqH6agVQTlFbdjmDb5sw6m3QMUFpdmqJ7WKq-OrM/s100/keys1.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div style="text-align: justify;">
<a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html" target="_blank">Surrogate Key</a> is sequentially generated unique number attached with each and every record in a Dimension table in any Data Warehouse. We discussed about <a href="http://www.disoln.org/2013/11/Surrogate-Key-in-Data-Warehouse-What-When-Why-and-Why-Not.html" target="_blank">Surrogate Key</a> in in detail in our previous article. Here in this article we will concentrate on different approaches to generate Surrogate Key for different type ETL process.</div>
<a name='more'></a><h2>
</h2>
<h2>
Surrogate Key for Dimensions Loading in Parallel</h2>
<div style="text-align: justify;">
When you have a single dimension table loading in parallel from different application data sources, special care should be given to make sure that no keys are duplicated. Lets see different design options here.</div>
<h3>
1. Using Sequence Generator Transformation</h3>
<div style="text-align: justify;">
This is the simplest and most preferred way to generate Surrogate Key(SK). We create a <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">reusable</a> <a href="http://www.disoln.org/2013/02/Sequence-Generator-Transformation-for-Unique-Key-Generation.html">Sequence Generator</a> transformation in the mapping and map the NEXTVAL port to the SK field in the target table in the INSERT flow of the mapping. The start value is usually kept 1 and incremented by 1.<br />
<br />
Below shown is a <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">reusable</a> <a href="http://www.disoln.org/2013/02/Sequence-Generator-Transformation-for-Unique-Key-Generation.html">Sequence Generator</a> transformation.</div>
<a href="http://lh6.ggpht.com/-RbUKdcd5xtw/Uo2PCLW8JQI/AAAAAAAAI64/rL_atJBHw84/s1600-h/image%25255B24%25255D.png"><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="398" src="http://lh4.ggpht.com/-gTVlGHp-uSI/Uo2PDeIxTuI/AAAAAAAAI7A/XLii2S7mO6Y/image_thumb%25255B20%25255D.png" style="border-style: none; border-width: 0px; display: block; float: none; margin: 10px auto 0px;" title="" width="537" /></a>
<br />
<div style="text-align: justify;">
NEXTVAL port from the <a href="http://www.disoln.org/2013/02/Sequence-Generator-Transformation-for-Unique-Key-Generation.html">Sequence Generator</a> can be mapped to the surrogate key in the target table. Below shown is the sequence generator transformation. </div>
<div style="text-align: justify;">
<div style="text-align: center;">
<a href="http://lh5.ggpht.com/-L1D8Y2VqOfs/Uo2eYREaQBI/AAAAAAAAI7Y/Tu1hUvmcmbA/image_thumb%25255B41%25255D.png"><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" src="http://lh5.ggpht.com/-L1D8Y2VqOfs/Uo2eYREaQBI/AAAAAAAAI7Y/Tu1hUvmcmbA/image_thumb%25255B41%25255D.png" style="border-style: none; border-width: 0px; display: block; float: none; margin: 10px auto 0px;" title="" /></a> </div>
<div style="text-align: justify;">
<br />
<span style="color: #cc0000;"><b>Note</b></span> : Make sure to create a reusable transformation, so that the same transformation can be reused in multiple mappings, which loads the same dimension table.</div>
<h3>
2. Using Database Sequence</h3>
<div>
We can create a SEQUENCE in the database and use the same to generate the SKs for any table. This can be invoked by a <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL Transformation</a> or a using a <a href="http://www.disoln.org/2013/03/Stored-Procedure-Transformation-to-Leverage-Existing-DB-Scripts.html">Stored Procedure Transformation</a>. <br />
<br /></div>
<div>
First we create a SEQUENCE using the following command. </div>
<blockquote class="tr_bq">
<div>
<i><span style="color: #cc0000;">CREATE SEQUENCE DW.Customer_SK</span></i></div>
<div>
<i><span style="color: #cc0000;">MINVALUE 1</span></i></div>
<div>
<i><span style="color: #cc0000;">MAXVALUE 99999999</span></i></div>
<div>
<i><span style="color: #cc0000;">START WITH 1</span></i></div>
<div>
<i><span style="color: #cc0000;">INCREMENT BY 1;</span></i></div>
</blockquote>
<h4>
Using SQL Transformation </h4>
You can create a create reusable <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html">reusable</a> <a href="http://www.disoln.org/2013/09/Informatica-SQL-Transformation-Beyond-Pre-Post-Session-SQL-Commands.html">SQL Transformation</a> as shown below. It takes the name of the database sequence and the schema name as input and returns SK numbers.
<br />
<div>
<a href="http://lh6.ggpht.com/-2yosbKWUsZw/Uo2O_PdQnQI/AAAAAAAAI6o/eBtGF78z02k/s1600-h/image%25255B19%25255D.png"><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="480" src="http://lh3.ggpht.com/-TcPanYV9tpo/Uo2PA0PCMlI/AAAAAAAAI6w/V2SmIuEFsqY/image_thumb%25255B15%25255D.png" style="border-color: -moz-use-text-color; border-style: none; border-width: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="534" /></a><br />
<div style="text-align: justify;">
Schema name (DW) and sequence name (Customer_SK) can be passed in as input value for the transformation and the output can be mapped to the target SK column. Below shown is the SQL transformation image.</div>
<a href="http://lh6.ggpht.com/-iMMjamjCqK4/Uo2eZkHSPwI/AAAAAAAAI7g/diTLqZ6cLqE/s1600-h/image%25255B41%25255D.png"><span id="goog_971408658"></span><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="162" src="http://lh5.ggpht.com/-nutI0evUrjM/Uo2ea4zUYhI/AAAAAAAAI7o/Um4h8XxHvz4/image_thumb%25255B33%25255D.png" style="border-style: none; border-width: 0px; display: block; float: none; margin: 10px auto -10px;" title="" width="312" /></a>
<br />
<h4>
Using Stored Procedure Transformation </h4>
We use the SEQUENCE <i>DW.Customer_SK </i>to generate the SKs in an Oracle function, which in turn called via a <a href="http://www.disoln.org/2013/03/Stored-Procedure-Transformation-to-Leverage-Existing-DB-Scripts.html">stored procedure transformation</a>.
<br />
<div>
<br />
Create a database function as below. Here we are creating an Oracle function.</div>
<blockquote class="tr_bq">
<div>
<i>CREATE OR REPLACE FUNCTION DW.Customer_SK_Func</i></div>
<div>
<i> RETURN NUMBER </i></div>
<div>
<i>IS</i></div>
<div>
<i> Out_SK NUMBER;</i></div>
<div>
<i>BEGIN</i></div>
<div>
<i><span class="Apple-tab-span" style="white-space: pre;"> </span>SELECT DW.Customer_SK.NEXTVAL INTO Out_SK FROM DUAL;</i></div>
<div>
<i><span class="Apple-tab-span" style="white-space: pre;"> </span>RETURN Out_SK;</i></div>
<div>
<i>EXCEPTION</i></div>
<div>
<i><span class="Apple-tab-span" style="white-space: pre;"> </span>WHEN OTHERS THEN</i></div>
<div>
<i> <span class="Apple-tab-span" style="white-space: pre;"> </span>raise_application_error(-20001,'An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);</i></div>
<div>
<i>END;</i></div>
</blockquote>
<div>
You can import the database function as a stored procedure transformation as shown in below image.</div>
<a href="http://lh4.ggpht.com/-ROCbiaH4Pbw/Uo2tbL8-mAI/AAAAAAAAI8A/TdSRqRHKzVs/s1600-h/image%25255B61%25255D.png"><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="153" src="http://lh6.ggpht.com/-N86yc0szHoU/Uo2tcUdf-YI/AAAAAAAAI8I/9svy698TZFg/image_thumb%25255B51%25255D.png" style="border: 0px; display: block; float: none; margin: 10px auto 0px;" title="" width="202" /></a><br />
<div>
Now, just before the target instance for Insert flow, we add an Expression transformation. We add an output port there with the following formula. This output port GET_SK can be connected to the target surrogate key column.<br />
<ul>
<li><span style="color: #cc0000;">GET_SK =:SP. CUSTOMER_SK_FUNC()</span></li>
</ul>
<ul>
</ul>
<a href="http://lh6.ggpht.com/-SAQpYN1BN9M/Uo2zquciJzI/AAAAAAAAI8Y/8DcDzayKpe8/s1600-h/image%25255B78%25255D.png"><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="398" src="http://lh6.ggpht.com/-VC1tM9rrQlo/Uo2zsLdU7hI/AAAAAAAAI8g/Kb6OFcDca6g/image_thumb%25255B67%25255D.png" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="538" /></a>
<br />
<b><span style="color: #cc0000;">Note</span></b> : Database function can be parametrized and the stored procedure can also be made reusable to make this approach more effective<br />
<h2 style="text-align: start;">
Surrogate Key for Non Parallel Loading Dimensions</h2>
<div style="text-align: justify;">
If the dimension table is not loading in parallel from different application data sources, we have couple of more options to generate SKs. Lets see different design options here.</div>
<h3>
Using Dynamic LookUP </h3>
<div>
When we implement Dynamic LookUP in any mapping, we may not even need to use the Sequence Generator for generating the SK values. </div>
<div>
<br /></div>
<div>
For a Dynamic LookUP on Target, we have the option of associating any LookUP port with an input port, output port, or Sequence-ID. When we associate a Sequence-ID, the Integration Service generates a unique Integer value for each inserted rows in the lookup cache., but this is applicable for the ports with Bigint, Integer or Small Integer data type. Since SK is usually of Integer type, we can exploit this advantage.</div>
<div>
<br /></div>
<div>
The Integration Service uses the following process to generate Sequence IDs.</div>
<ul style="text-align: start;">
<li style="text-align: justify;">When the Integration Service creates the dynamic lookup cache, it tracks the range of values for each port that has a sequence ID in the dynamic lookup cache.</li>
<li style="text-align: justify;">When the Integration Service inserts a row of data into the cache, it generates a key for a port by incrementing the greatest sequence ID value by one.</li>
<li style="text-align: justify;">When the Integration Service reaches the maximum number for a generated sequence ID, it starts over at one. The Integration Service increments each sequence ID by one until it reaches the smallest existing value minus one. If the Integration Service runs out of unique sequence ID numbers, the session fails.</li>
</ul>
<a href="http://lh4.ggpht.com/-L3O_dJCQiZM/Uo7ZCcQMnGI/AAAAAAAAI80/wd4bmPtyqNk/s1600-h/image%25255B19%25255D.png"><img alt="Different Approaches to Generate Surrogate Key in Informatica PowerCenter" border="0" height="475" src="http://lh5.ggpht.com/-CQH5uSO5vhs/Uo7ZC3ZTxDI/AAAAAAAAI84/ByxehkImZZM/image_thumb%25255B17%25255D.png" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="591" /></a>
<br />
<div>
Above shown is a dynamic lookup configuration to generate SK for CUST_SK.<br />
<br />
The Integration Service generates a Sequence-ID for each row it inserts into the cache. For any records which is already present in the Target, it gets the SK value from the Target Dynamic LookUP cache, based on the Associated Ports matching. So, if we take this port and connect to the target SK field, there will not be any need to generate SK values separately, since the new SK value(for records to be Inserted) or the existing SK value(for records to be Updated) is supplied from the Dynamic LookUP.</div>
<div>
<br /></div>
<div>
The disadvantage of this technique lies in the fact that we don’t have any separate SK Generating Area and the source of SK is totally embedded into the code.<br />
<h3>
Using Expression Transformation</h3>
Suppose we are populating a CUSTOMER_DIM. So in the Mapping, first create a Unconnected Lookup for the dimension table, say LKP_CUSTOMER_DIM. The purpose is to get the maximum SK value in the dimension table. Say the SK column is CUSTOMER_KEY and the NK column is CUSTOMER_ID.<br />
<br />
Select CUSTOMER_KEY as <span style="color: #cc0000;">Return Port</span> and <span style="color: #cc0000;">Lookup Condition</span> as <br />
<ul>
<li><span style="color: #cc0000;">CUSTOMER_ID = IN_CUSTOMER_ID</span></li>
</ul>
Use the <span style="color: #cc0000;">SQL Override</span> as below:<br />
<ul>
<li><span style="color: #cc0000;">SELECT MAX (CUSTOMER_KEY) AS CUSTOMER_KEY, '1' AS CUSTOMER_ID FROM CUSTOMER_DIM</span></li>
</ul>
Next in the mapping after the SQ use an <span style="color: #cc0000;">Expression transformation</span>. Here actually we will be generating the SKs for the Dimension based on the previous value generated. We will create the following ports in the EXP to compute the SK value.<br />
<ul>
<li><span style="color: #cc0000;">VAR_COUNTER = IIF(ISNULL( VAR_INC ), NVL(:LKP.LKP_CUSTOMER_DIM('1'), 0) + 1, VAR_INC + 1 )</span></li>
<span style="color: #cc0000;">
</span>
<li><span style="color: #cc0000;">VAR_INC = VAR_COUNTER</span></li>
<span style="color: #cc0000;">
</span>
<li><span style="color: #cc0000;">OUT_COUNTER = VAR_COUNTER</span></li>
</ul>
When the mapping starts, for the first row we will look up the Dimension table to fetch the maximum available SK in the table. Next we will keep on incrementing the SK value stored in the variable port by 1 for each incoming row. Here the O_COUNTER will give the SKs to be populated in CUSTOMER_KEY.<br />
<h3>
Using Mapping & Workflow Variable</h3>
Here again we will use the Expression transformation to compute the next SK, but will get the MAX available SK in a different way.<br />
<br />
Suppose, we have a session s_New_Customer, which loads the Customer Dimension table. Before that session in the Workflow, we add a dummy session as s_Dummy.<br />
<a href="http://lh3.ggpht.com/-qVXVckepRpw/Uo7o38yNwEI/AAAAAAAAI9M/pddU10PpsP8/s1600-h/image%25255B37%25255D.png"><img alt="image" border="0" height="94" src="http://lh4.ggpht.com/-z23aRo1gfRE/Uo7o4Hg9jOI/AAAAAAAAI9Q/MqbWdqjwWig/image_thumb%25255B33%25255D.png" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 5px;" title="image" width="476" /></a>In s_Dummy, we will have a mapping variable, e.g. $$MAX_CUST_SK which will be set with the value of MAX (SK) in Customer Dimension table.<br />
<ul>
<li><span style="color: #cc0000;">SELECT MAX (CUSTOMER_KEY) AS CUSTOMER_KEY FROM CUSTOMER_DIM</span></li>
</ul>
We will have the CUSTOMER_DIM as our source table and target can be a simple flat file, which will not be used anywhere. We pull this MAX (SK) from the SQ and then in an EXP we assign this value to the mapping variable using the <b>SETVARIABLE</b> function. So, we will have the following ports in the EXP:<br />
<ul>
<li><span style="color: #cc0000;">INP_CUSTOMER_KEY = INP_CUSTOMER_KEY -– The MAX of SK coming from Customer Dimension table.</span></li>
<li><span style="color: #cc0000;">OUT_MAX_SK = SETVARIABLE ($$MAX_CUST_SK, INP_CUSTOMER_KEY) –- Output Port</span></li>
</ul>
This output port will be connected to the flat file port, but the value we assigned to the variable will persist in the repository. <br />
<br />
In our second mapping we start generating the SK from the value $$MAX_CUST_SK + 1. But how can we pass the parameter value from one session into the other one? <br />
<br />
Here the use of Workflow Variable comes into picture. We define a WF variable as $$MAX_SK and in the Post-session on success variable assignment section of s_Dummy, we assign the value of $$MAX_CUST_SK to $$START_SK. Now the variable $$MAX_SK contains the maximum available SK value from CUSTOMER_DIM table. Next we define another mapping variable in the session s_New_Customer as $$START_VALUE and this is assigned the value of $$MAX_SK in the Pre-session variable assignment section of s_New_Customer.<br />
<br />
So, the sequence is:<br />
<ul>
<li><span style="color: #cc0000;">Post-session on success variable assignment of First Session: </span></li>
<ul>
<li><span style="color: #cc0000;">$$MAX_SK = $$MAX_CUST_SK</span></li>
</ul>
<li><span style="color: #cc0000;">Pre-session variable assignment of Second Session: </span></li>
<ul>
<li><span style="color: #cc0000;">$$START_VALUE = $$MAX_SK</span></li>
</ul>
</ul>
Now in the actual mapping, we add an EXP and the following ports into that to compute the SKs one by one for each records being loaded in the target.<br />
<ul>
<li><span style="color: #cc0000;">VAR_COUNTER = IIF (ISNULL (VAR_INC), $$START_VALUE + 1, VAR_INC + 1)</span></li>
<div class="sticky taped" style="float: right;">
<span style="color: #cc0000; font-size: large;"><b><u>About the Author</u></b></span><br />
<script data-format="inline" data-id="http://www.linkedin.com/pub/debraj-ghosh/78/626/32a" data-related="false" data-width="200" type="IN/MemberProfile"></script></div>
<li><span style="color: #cc0000;">VAR_INC = VAR_COUNTER</span></li>
<li><span style="color: #cc0000;">OUT_COUNTER = VAR_COUNTER</span></li>
</ul>
OUT_COUNTER will be connected to the SK port of the target.<br />
<br />
Hope you enjoyed this article and earned some new ways to generate surrogate keys for your dimension tables. Please leave us a comment or feedback if you have any, we are happy to hear from you. </div>
</div>
</div>
</div><div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-86713318073153132492013-11-13T20:02:00.000-08:002014-01-21T18:44:04.398-08:00Surrogate Key in Data Warehouse, What, When and Why<img align="left" alt="Surrogate Key in Data Warehouse, What, When, Why and Why Not" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNskm9RsmfRZn4CCjXAknrWZMiJ2W8ir8KK_Fj4aM1XzIEihyphenhyphenOo6pl6eSFCMgJfZP4LUxA_BdpW90Vq-p6Yjfo6FmUL5wnGVYQWbkJ3rz4nVrKBA9HfUA6aA8xjMQnen2E52dvRG_8HRE/s100/Skell-keys.jpg" height="100" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div style="text-align: justify;">
Surrogate keys are widely used and accepted design standard in data warehouses. It is sequentially generated unique number attached with each and every record in a Dimension table in any Data Warehouse. It join between the fact and dimension tables and is necessary to handle changes in dimension table attributes.
</div>
<a name='more'></a><h2>
What Is Surrogate Key</h2>
<div style="text-align: justify;">
Surrogate Key (SK) is sequentially generated meaningless unique number attached with each and every record in a table in any Data Warehouse (DW).</div>
<ul>
<li style="text-align: justify;">It is <i><span style="color: #cc0000;">UNIQUE</span></i> since it is sequentially generated integer for each record being inserted in the table.</li>
<li style="text-align: justify;">It is <span style="color: #cc0000;"><i>MEANINGLESS</i></span> since it does not carry any business meaning regarding the record it is attached to in any table. </li>
<li style="text-align: justify;">It is <span style="color: #cc0000;"><i>SEQUENTIAL</i></span> since it is assigned in sequential order as and when new records are created in the table, starting with one and going up to the highest number that is needed.</li>
</ul>
<h2>
Surrogate Key Pipeline and Fact Table</h2>
<div style="text-align: justify;">
During the FACT table load, different dimensional attributes are looked up in the corresponding Dimensions and SKs are fetched from there. These SKs should be fetched from the most recent versions of the dimension records. Finally the FACT table in DW contains the factual data along with corresponding SKs from the Dimension tables. </div>
<ul>
</ul>
<div>
The below diagram shows how the FACT table is loaded from the source. <img alt="Surrogate Key in Data Warehouse, What, When, Why and Why Not" border="0" src="http://lh3.ggpht.com/-oGtkJYmDcLs/UoL9hE8wlmI/AAAAAAAAI5s/Ss_OJxFAlkk/image_thumb%25255B29%25255D.png?imgmax=800" height="368" style="border-color: -moz-use-text-color; border-style: none; border-width: 0px; display: block; float: none; margin: 10px auto -25px;" title="" width="538" />
<br />
<h2>
Why Should We Use Surrogate Key</h2>
<div style="text-align: justify;">
Basically it’s an artificial key that is used as a substitute for a Natural Key (NK). We should have defined NK in our tables as per the business requirement and that might be able to uniquely identify any record. But, SK is just an Integer attached to a record for the purpose of joining different tables in a Star or Snowflake schema based DW. SK is much needed when we have very long NK or the datatype of the NK is not suitable for Indexing. </div>
<div>
<br /></div>
The below image shows a typical Star Schema, joining different Dimensions with the Fact using SKs.<br />
<div align="center" class="normal" style="text-align: center;">
<img alt="Surrogate Key in Data Warehouse, What, When, Why and Why Not" border="0" src="http://lh4.ggpht.com/-Kx5b2LKLOrc/UoL9gXS0x6I/AAAAAAAAI5c/agsuRf2_DTU/image_thumb%25255B13%25255D.png?imgmax=800" height="208" style="border-color: -moz-use-text-color; border-style: none; border-width: 0px; display: block; float: none; margin: 5px auto 1px;" title="" width="481" /><o:p> </o:p> </div>
<div align="center" class="normal" style="text-align: center;">
</div>
<div class="normal" style="text-align: justify;">
<div class="normal" style="text-align: justify;">
<o:p></o:p><a href="http://www.kimballgroup.com/" rel="nofollow" style="text-align: justify;" target="_blank">Ralph Kimball</a><span style="text-align: justify;"> </span><span style="text-align: justify;">emphasizes more on the abstraction of NK. As per him, Surrogate Keys should NOT be:</span></div>
<ul style="text-align: justify;">
<li>Smart, where you can tell something about the record just by looking at the key. </li>
<li>Composed of natural keys glued together. </li>
<li>Implemented as multiple parallel joins between the dimension table and the fact table; so-called double or triple barreled joins.</li>
</ul>
<div style="text-align: justify;">
As per <a href="http://blog.kejser.org/" target="_blank">Thomas Kejser</a>, a “good key” is a column that has the following properties:</div>
<ul style="text-align: justify;">
<li>It forced to be unique</li>
<li>It is small</li>
<li>It is an integer</li>
<li>Once assigned to a row, it never changes</li>
<li>Even if deleted, it will never be re-used to refer to a new row</li>
<li>It is a single column</li>
<li>It is stupid</li>
<li>It is not intended as being remembered by users</li>
</ul>
<div>
If the above mentioned features are taken into account, SK would be a great candidate for a Good Key in a DW.<br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Apart from these, few more reasons for choosing this SK approach are:</div>
<ul style="text-align: justify;">
<li>If we replace the NK with a single Integer, it should be able to
save a substantial amount of storage space. The SKs of different
Dimensions would be stored as Foreign Keys (FK) in the Fact tables to
maintain <span style="color: #cc0000;">Referential Integrity</span> (RI), and
here instead of storing of those big or huge NKs, storing of concise SKs
would result in less amount of space needed. The UNIQUE indexes built
on the SK will take less space than the UNIQUE index built on the NK
which may be alphanumeric. </li>
<li>Replacing big, ugly NKs and composite keys with beautiful, tight
integer SKs is bound to improve join performance, since joining two
Integer columns works faster. So, it provides an extra edge in the <span style="color: #cc0000;">ETL performance</span> by fastening data retrieval and lookup. </li>
<li>Advantage of a four-byte integer key is that it can represent more
than 2 billion different values, which would be enough for any dimension
and SK would not run out of values, not even for the Big or Monster
Dimension. </li>
<li>SK is usually independent of the data contained in the record, we
cannot understand anything about the data in a record simply by seeing
only the SK. Hence it provides <span style="color: #cc0000;">Data Abstraction</span>.</li>
</ul>
<div style="text-align: justify;">
So, apart from the abstraction
of critical business data involved in the NK, we have the advantage of
storage space reduction as well to implement the SK in our DW. It has
become a <span style="color: #cc0000;">Standard Practice</span> to associate an SK with a table in DW irrespective of being it a Dimension, Fact, Bridge or Aggregate table.</div>
<h2>
Why Shouldn’t We Use Surrogate Key</h2>
<div style="text-align: justify;">
There are myriad number of disadvantages as well while working with SK. Let’s see them one by one:</div>
<ul style="text-align: justify;">
<li>The values of SKs have no relationship with the real world meaning of the data held in a row. Therefore over usage of SKs lead to the problem of <span style="color: #cc0000;">disassociation</span>.</li>
<li>The generation and attachment of SK creates unnecessary ETL burden. Sometimes it may be found that the actual piece of code is short and simple, but generating the SK and carrying it forward till the target adds extra overhead on the code.</li>
<li>During the <span style="color: #cc0000;">Horizontal Data Integration</span> (DI) where multiple source systems loads data into a single Dimension, we have to maintain a single SK Generating Area to enforce the Uniqueness of SK. This may come as an extra overhead on the ETL.</li>
<li>Even query optimization becomes difficult since SK takes the place of PK, unique index is applied on that column. And any query based on NK leads to <span style="color: #cc0000;">Full Table Scan</span> (FTS) as that query cannot take the advantage of unique index on the SK.</li>
<li>Replication of data from one environment to another, i.e. <span style="color: #cc0000;">Data Migration</span>, becomes difficult since SKs from different Dimension tables are used as the FKs in the Fact table and SKs are DW specific, any mismatch in the SK for a particular Dimension would result in no data or erroneous data when we join them in a Star Schema.</li>
<li>If duplicate records come from the source, there is a potential risk of duplicates
<div class="sticky taped" style="float: right;">
<span style="color: #cc0000; font-size: large;"><b><u>About the Author</u></b></span><br />
<script data-format="inline" data-id="http://www.linkedin.com/pub/debraj-ghosh/78/626/32a" data-related="false" data-width="200" type="IN/MemberProfile"></script></div>
being loaded into the target, since Unique Constraint is defined on the SK and not on the NK.
</li>
</ul>
<div style="text-align: justify;">
SK should not be implemented just in the name of standardizing your code. SK is required when we cannot use an NK to uniquely identify a record or when using an SK seems more suitable as the NK is not a good fit for PK.<br />
<br />
Reference : <a href="http://www.kimballgroup.com/" rel="nofollow" target="_blank">Ralph Kimball</a>, <a href="http://blog.kejser.org/" target="_blank">Thomas Kejser</a></div>
</div>
</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-84523129712384352862013-11-08T23:06:00.000-08:002013-11-30T12:37:59.936-08:00Informatica PowerCenter Load Balancing for Workload Distribution on Grid<img align="left" alt="Informatica PowerCenter load balancing" border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMVUtnkr62yBuIBoc-f5xSIXxrE4Tzd-BhoDYWB5X1tMTLqUOCFWKwam0OYLS2JCbGd8xY4N6Po9dPb_V9O3aP-QgPXanBsE6fmGWyDsbknZvXdtbhvfWuCMKIZlq_59dzTwHgWzLe6Ek/s100/balancing.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
Informatica PowerCenter Workflows runs on <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html">grid</a>, distributes workflow tasks across nodes in the <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="_blank">grid</a>. It also distributes Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="_blank">grid</a>. PowerCenter uses load balancer to distribute workflows and session tasks to different nodes. This article describes, how to use load balancer to setup high workflow priorities and how to allocate resources.<br />
<a name='more'></a></div>
<h2>
What is Informatica Load Balancing</h2>
<div class="sticky taped" style="background-position: initial initial; background-repeat: initial initial; float: right;">
<b>Performance Improvement Features</b><br />
<div style="text-align: center;">
<a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Pushdown-Optimization-an-ELT-Approach.html" target="">Pushdown Optimization</a><br />
<a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html" target="">Pipeline Partitions</a><br />
<a href="http://www.disoln.org/2013/08/Dynamic-Partitioning-to-Increase-Parallelism-Based-on-Resources-Availability.html" target="">Dynamic Partitions</a><br />
<a href="http://www.disoln.org/2012/11/Informatica-Concurrent-Workflows-to-Reduce-Warehouse-ETL-Load-Time.html" target="">Concurrent Workflows</a><br />
<a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="">Grid Deployments</a><br />
<a href="http://www.disoln.org/2013/11/Informatica-PowerCenter-Load-Balancing-for-Workload-Distribution.html" target="">Workflow Load Balancing</a></div>
</div>
<div class="p1" style="text-align: justify;">
Informatica load Balancing is a mechanism which distributes the workloads across the nodes in the <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="_blank">grid</a>. When you run a workflow, the Load Balancer dispatches different tasks in the workflow such as Session, Command, and predefined Event-Wait tasks to different <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="_blank">nodes</a> running the Integration Service. Load Balancer matches task requirements with resource availability to identify the best node to run a task. It may dispatch tasks to a single node or across nodes on the <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="_blank">grid</a>.<br />
<h2>
Identifying the Nodes to Run a Task </h2>
Load Balancer matches the resources required by the task with the resources available on each node. It dispatches tasks in the order it receives them. You can adjust the workflow priorities and the assign resources needs for tasks, such that load balancer can distribute the tasks to the right nodes and right priority.</div>
<table cellpadding="0" cellspacing="0" class="t1">
<tbody>
<tr>
<td class="td1" valign="middle"><div class="p1">
<br /></div>
</td>
<td class="td1" valign="middle"><div class="p1">
<br /></div>
<div class="p2">
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Assign service levels</span></b> : You assign service levels to workflows. Service levels establish priority among workflow tasks that are waiting to be dispatched.</div>
</div>
</td>
</tr>
<tr>
<td class="td1" valign="middle"><div class="p1">
<div style="text-align: justify;">
<br /></div>
</div>
</td>
<td class="td1" valign="middle"><div class="p1">
<div style="text-align: justify;">
<br /></div>
</div>
<div class="p2">
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Assign resources</span></b> : You assign resources to tasks. Session, Command, and predefined Event-Wait tasks require PowerCenter resources to succeed. If the Integration Service is configured to check resources, the Load Balancer dispatches these tasks to nodes where the resources are available.
<br />
<h2>
Assigning Service Levels to Workflows</h2>
<div style="text-align: justify;">
Service levels determine the order in which the Load Balancer dispatches tasks from the dispatch queue. When multiple tasks are waiting to be dispatched, the Load Balancer dispatches high priority tasks before low priority tasks. You create service levels and configure the dispatch priorities in the Administrator tool.<br />
<br />
Integration service will be limited to run You give <span style="color: #cc0000;">Higher Service Level</span> for the workflows, which needs to be dispatched first, when multiple workflows are running in parallel. Service Levels are set up in the Admin console.<br />
<br />
You assign service levels to workflows on the General tab of the workflow properties as shown below. </div>
<a href="http://lh4.ggpht.com/-xKrleZ94Khg/UoBT2cmaEsI/AAAAAAAAI4c/tbylOzlkJAs/s1600-h/image%25255B58%25255D.png"><img alt="Informatica PowerCenter Load Balancing for Workload Distribution on Grid" border="0" height="530" src="http://lh4.ggpht.com/-ZkOj_-r6DaQ/UoBT202Qg3I/AAAAAAAAI4g/5qRlDU0_9ds/image_thumb%25255B52%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto -15px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="475" /></a>
<br />
<h2>
Assigning Resources to Tasks</h2>
<div style="text-align: justify;">
If the Integration Service runs on a <a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="_blank">grid</a> and is configured to check for available resources, the Load Balancer uses resources to dispatch tasks. The Integration Service matches the resources required by tasks in a workflow with the resources available on each node in the grid to determine which nodes can run the tasks. <br />
<br />
You can configure the resource requirements by the tasks as shown in below image.<br />
<br />
Below configuration shows that, the source qualifier needs source file from <i>File Directory</i> <i>NDMSource</i>, which is accessible only from one node. Available resource on different nodes are configured from Admin console.</div>
</div>
</div>
<a href="http://lh6.ggpht.com/-crzegtRbC5I/UoBWHntH_4I/AAAAAAAAI5A/UjvBSzRZnCs/s1600-h/image%25255B98%25255D.png"><img alt="Informatica PowerCenter Load Balancing for Workload Distribution on Grid" border="0" height="480" src="http://lh4.ggpht.com/-iE3fV-elfLY/UoBWIJmfRtI/AAAAAAAAI5E/2SVfpj0CN4Y/image_thumb%25255B82%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 10px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="558" /></a>
</td></tr>
</tbody></table>
<div style="text-align: justify;">
Hope you enjoyed this article and this will help you prioritize your workflows to to meet your data refresh time lines. Please leave us a comment or feedback if you have any, we are happy to hear from you.</div><div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-82097228569520834612013-10-31T07:53:00.000-07:002013-11-29T13:09:02.646-08:00Informatica PowerCenter on Grid for Greater Performance and Scalability<img align="left" alt="Informatica PowerCenter Workflows on Grid for Performance and Scalability" border="0" height="100" src="http://1.bp.blogspot.com/-QM8bcYmuCvk/UnMn3y9Tc6I/AAAAAAAAI1s/YiitbjEM1ZU/s100/grid.png" style="border-width: 0px; display: inline; margin: 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
Informatica has developed a solution that leverages the power of <a href="http://www.tdan.com/view-articles/9378" rel="nofollow" target="_blank">grid computing</a> for greater data integration scalability and <a href="http://www.disoln.org/search/label/Performance%20Tips?&max-results=15" target="_blank">performance</a>. The grid option delivers the load balancing, <a href="http://www.disoln.org/2013/08/Dynamic-Partitioning-to-Increase-Parallelism-Based-on-Resources-Availability.html" target="_blank">dynamic</a> <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-When-Where-and-How.html" target="_blank">partitioning</a>, <a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html" target="_blank">parallel processing</a> and <a href="http://www.disoln.org/2013/07/Workflow-Recovery-Configuration-for-Informatica-PowerCenter-Workflows.html" target="_blank">high availability</a> to ensure optimal scalability, performance and reliability. In this article lets discuss how to setup Infrmatica Workflow to run on grid. </div>
<a name='more'></a><h2>
What is PowerCenter On Grid</h2>
<div class="sticky taped" style="background-position: initial initial; background-repeat: initial initial; float: right;">
<b>Performance Improvement Features</b><br />
<div style="text-align: center;">
<a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Pushdown-Optimization-an-ELT-Approach.html" target="">Pushdown Optimization</a><br />
<a href="http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-for-Parallel-Processing.html" target="">Pipeline Partitions</a><br />
<a href="http://www.disoln.org/2013/08/Dynamic-Partitioning-to-Increase-Parallelism-Based-on-Resources-Availability.html" target="">Dynamic Partitions</a><br />
<a href="http://www.disoln.org/2012/11/Informatica-Concurrent-Workflows-to-Reduce-Warehouse-ETL-Load-Time.html" target="">Concurrent Workflows</a><br />
<a href="http://www.disoln.org/2013/10/Informatica-PowerCenter-Workflows-on-Grid-for-Performance-and-Scalability.html" target="">Grid Deployments</a><br />
<a href="http://www.disoln.org/2013/11/Informatica-PowerCenter-Load-Balancing-for-Workload-Distribution.html" target="">Workflow Load Balancing</a></div>
</div>
<div>
<div class="p1">
<div style="text-align: justify;">
When a PowerCenter domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>Domain </b></span>: A PowerCenter domain consists of one or more nodes in the grid environment. PowerCenter services run on the nodes. A domain is the foundation for PowerCenter service administration.</div>
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>Node </b></span>: A node is a logical representation of a physical machine that runs a PowerCenter service.</div>
<h2>
Admin Console with Grid Configuration</h2>
Below shown is an Informatica Admin Console, with two node Grid configuration. We can see two nodes Node_1, Node_2 and the Node_GRID grid created using two nodes. The integration service Int_service_GRID is running on the grid.</div>
</div>
<a href="http://lh5.ggpht.com/-zuQDripbCoM/UncAJGR9cHI/AAAAAAAAI2g/QZvjvcW4Kmw/s1600-h/image%25255B22%25255D.png"><img alt="image" border="0" height="463" src="http://lh5.ggpht.com/-IF_2VSkE_7A/UncAK4vq2UI/AAAAAAAAI2o/dDc1R0t4wGs/image_thumb%25255B16%25255D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto -25px;" title="image" width="750" /></a>
<br />
<h2>
Setting up Workflow on Grid</h2>
<div style="text-align: justify;">
When you setup a workflow to run grid, the Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.</div>
<div style="text-align: justify;">
<br />
You can setup the workflow to run on grid as shown in below image.You can assign the integration service, which is configured on grid to run the workflow on grid.</div>
<a href="http://lh5.ggpht.com/-AQF_zRIajXo/UncD939SInI/AAAAAAAAI20/wRQ5cGdpPqY/s1600-h/image%25255B34%25255D.png"><img alt="image" border="0" height="603" src="http://lh5.ggpht.com/-tL_DnyxfqqA/UncD_MxROAI/AAAAAAAAI28/ilyt6_TwdiY/image_thumb%25255B26%25255D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto -25px;" title="image" width="543" /></a>
<br />
<h2>
Setting up Session on Grid</h2>
<div style="text-align: justify;">
When you run a session on a grid, the Integration Service distributes session threads across nodes in a grid. The Load Balancer distributes session threads to <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html#Anatomy" target="_blank">DTM processes</a> running on different nodes. You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
You can setup the session to run on grid as shown in below image. </div>
<a href="http://lh3.ggpht.com/-Ghgx_cGeOPI/UnciFbxOMcI/AAAAAAAAI3w/cianuzxjCtM/s1600-h/image%25255B87%25255D.png"><img alt="image" border="0" height="476" src="http://lh6.ggpht.com/-1gBddTZ_UT4/UnciGxVzXJI/AAAAAAAAI34/7hadYI3W8yk/image_thumb%25255B71%25255D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto -25px;" title="image" width="640" /></a>
<br />
<h2>
Workflow Running on Grid</h2>
<div style="text-align: justify;">
Below workflow monitor screen shots sows a workflow running on grid. You see two of the session in the workflow wf_Load_CUST_DIM run on Node_1 and other one on Node_1 from '<span style="color: #cc0000;">Task Progress Details</span>' Window.</div>
<div style="text-align: justify;">
<h2>
<a href="http://lh5.ggpht.com/-0E6ndXTtIRs/UncguGINhgI/AAAAAAAAI3M/LH5f9oO2wXc/s1600-h/image%25255B74%25255D.png"><img alt="image" border="0" height="472" src="http://lh3.ggpht.com/-ao_I5JQMAws/UncgvzukiCI/AAAAAAAAI3U/lK8TWX59jFA/image_thumb%25255B60%25255D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin: 10px auto 15px;" title="image" width="750" /></a>Key Features and Advantages of Grid </h2>
<div style="text-align: justify;">
<ul>
<li><span style="color: #cc0000;"><b>Load Balancing</b></span> : While facing spikes in data processing, load balance guarantees smooth operations by switching the data processing between nodes on the grid. The node is chosen dynamically based on process size, CPU utilization, memory requirements etc...</li>
<li><span style="color: #cc0000;"><b>High Availability</b></span> : Grid complements the <a href="http://www.disoln.org/2013/07/Workflow-Recovery-Configuration-for-Informatica-PowerCenter-Workflows.html" target="_blank">High Availability</a> feature or PowerCenter by switching the master node in case of a node failure. This ensures the monitoring and the shorten time needed for recovery processes.</li>
<li><span style="color: #cc0000;"><b>Dynamic Partitioning</b></span> : <a href="http://www.disoln.org/2013/08/Dynamic-Partitioning-to-Increase-Parallelism-Based-on-Resources-Availability.html" target="_blank">Dynamic Partitioning</a> helps making the best use of currently available nodes on the grid. By adapting to available resources, it also helps increasing the performance of the whole ETL process. </li>
</ul>
</div>
</div>
<div style="text-align: justify;">
Hope you enjoyed this article, please leave us a comment or feedback if you have any, we are happy to hear from you.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-70387053124718723662013-10-23T22:49:00.000-07:002013-12-03T22:03:45.645-08:00Time Zones Conversion and Standardization Using Informatica PowerCenter<img align="left" alt="Time Zones Conversion and Standardization Using Informatica PowerCentern" border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFwJAG1NMap53g_MK2D-CRi8V-wwhwurUooHzbqwGUVZNK-zfWtxqLJU3gaqYwO6GTO9u6Lcuj1jmVNTRmIrKHBonL4fouW13DGtviZZU6VbiI5jfQ6i1Y4d3ik41Lwbb1gCdl5uUucFM/s100/timezone.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div>
<div style="text-align: justify;">
When your data warehouse is sourcing data from multi-time zoned data sources, it is <a href="http://www.kimballgroup.com/1998/12/02/think-globally-act-locally/" rel="nofollow" target="_blank">recommended</a> to capture a universal standard time, as well as local times. Same goes with transactions involving multiple <a href="http://www.disoln.org/2013/09/Informatica-HTTP-Transformation-The-Interface-Between-ETL-and-Web-Services.html#UseCase" target="_blank">currencies</a>. This design enables analysis on the local time along with the universal standard time. The time standardization will be done as part of the ETL, which loads the warehouse. In this article lets discuss about the implementation using Informatica PowerCenter.</div>
<a name='more'></a><br />
We will concentrate only on the ETL part of time zone conversion and standardization, but not the data modeling part. You can learn more about the <a href="http://www.kimballgroup.com/1998/12/02/think-globally-act-locally/" rel="nofollow" target="_blank">dimensional modeling</a> aspect from <a href="http://www.kimballgroup.com/" rel="nofollow" target="_blank">Ralph Kimball</a>.<br />
<h2>
Business Use Case</h2>
<div style="text-align: justify;">
Lets consider an ETL job, which is used to integrate sales data from different global sales regions in to the enterprise data warehouse. Sales transactions are happening in different time zones and from different sales applications. Local sales applications are capturing sales in the local time. Data in the warehouse needs to be standardized and sales transaction need to be captured in local as well as GMT time.<br />
<br />
<span style="color: #cc0000;"><b>Solution</b></span> : Create a <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html" target="_blank">reusable expression</a> to convert the local time into GMT time. This transformation can be <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html" target="_blank">reused </a>in all the ETL process, which needs a time standardization. This reusable transformation can be used in any Mapping, which needs the time zone conversion.<br />
<h2>
Building the Reusable Expression</h2>
<div>
You can create the reusable transformation in the <i><span style="color: #cc0000;">Transformation Developer</span></i>.</div>
<div>
<br /></div>
In the expression transformation, you can create below ports and the corresponding expressions. Be sure to have the ports created in the same order, data type and precision in the transformation.<br />
<span style="color: #cc0000;"><span style="font-size: x-small;"><i><span style="color: #cc0000;"></span></i></span></span></div>
<ul><ul>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>LOC_TIME_WITH_TZ : STRING</i></span>(36) (Input)</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>DATE_TIME </i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>: </i></span></i></span><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>DATE/TIME </i></span></i></span></i></span></i></span></i></span></i></span><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>(Variable) </i></span></i></span></i></span></i></span></i></span></i></span></i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>TZ_DIFF : INTEGER (Variable)</i></span></i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>TZ_DIFF_HR (V)</i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i> : INTEGER (Variable)</i></span></i></span></i></span></i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>TZ_DIFF_MI (V) </i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>: INTEGER (Variable)</i></span></i></span></i></span></i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>GMT_TIME_HH : </i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>DATE/TIME </i></span></i></span></i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>(Variable) </i></span></i></span></i></span></i></span></i></span></i></span></i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>GMT_TIME_MI : </i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>DATE/TIME</i></span></i></span></i></span> </i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i>(Variable) </i></span></i></span></i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i>GMT_TIME_WITH_TZ </i></span></i></span><span style="color: #cc0000; font-size: small;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i><span style="color: #cc0000;"><i> STRING</i></span>(36) </i></span>(Output)</i></span></i></span><span style="font-size: small;"> </span></li>
</ul>
</ul>
Now create expressions as below for all the ports.<br />
<div style="text-align: justify;">
<ul><ul>
<li style="text-align: left;"><span style="color: #cc0000; font-size: small;"><i>DATE_TIME : TO_DATE(SUBSTR(LOC_TIME_WITH_TZ,0,29),'DD-MON-YY HH:MI:SS.US AM')</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i>TZ_DIFF : IIF(SUBSTR(LOC_TIME_WITH_TZ,30,1)='+',-1,1)</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i>TZ_DIFF_HR : TO_DECIMAL(SUBSTR(LOC_TIME_WITH_TZ,31,2))</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i>TZ_DIFF_MI : TO_DECIMAL(SUBSTR(LOC_TIME_WITH_TZ,34,2))</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i>GMT_TIME_HH : ADD_TO_DATE(DATE_TIME,'HH',TZ_DIFF_HR*TZ_DIFF)</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i>GMT_TIME_MI : ADD_TO_DATE(GMT_TIME_HH,'MI',TZ_DIFF_MI*TZ_DIFF)</i></span></li>
<span style="font-size: small;">
</span>
<li><span style="color: #cc0000; font-size: small;"><i>GMT_TIME_WITH_TZ : TO_CHAR(GMT_TIME_MI,'DD-MON-YYYY HH:MI:SS.US AM') || ' +00:00' </i></span></li>
</ul>
</ul>
<span style="color: #cc0000; font-size: small;"><b>Note</b><i> </i></span>: The expression is based on the timestamp format 'DD-MON-YYYY HH:MI:SS.FF AM TZH:TZM'. If you are using a different <a href="http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1006081" rel="nofollow" target="_blank">oracle timestamp format</a>, this expression might not work.<span style="color: #cc0000;"><span style="font-size: x-small;"><i> </i></span></span><br />
<ul><ul>
</ul>
</ul>
</div>
<div style="text-align: justify;">
Below is the expression transformation with the expressions added.</div>
<div style="text-align: justify;">
<a href="http://lh4.ggpht.com/-hKfSUQuzujc/Um3QlroQdpI/AAAAAAAAI0Y/M6YVStNHTaw/s1600-h/image%25255B12%25255D.png"><img alt="Time Zones Conversion and Standardization Using Informatica PowerCenter" border="0" height="336" src="http://lh6.ggpht.com/-BGhBaEoklvE/Um3QndtrNEI/AAAAAAAAI0g/pkzq-Q77mTA/image_thumb%25255B10%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="700" /></a>
<br />
<div style="text-align: justify;">
The reusable transformation can be used in any Mapping, which needs the time zone conversion. Below shown is the completed expression transformation.<img alt="Time Zones Conversion and Standardization Using Informatica PowerCenter" border="0" height="129" src="http://lh3.ggpht.com/-5TnfegGANUU/Um3QpvwTcDI/AAAAAAAAI0w/AYWEMVN-aY0/image_thumb%25255B21%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="284" /> You can see a sample output data generated by expression as shown in below image. </div>
<a href="http://lh3.ggpht.com/-MsNy999Xm5g/Um3USav8yfI/AAAAAAAAI08/h4VJ5Ag35UE/s1600-h/image%25255B48%25255D.png"><img alt="Time Zones Conversion and Standardization Using Informatica PowerCenter" border="0" height="128" src="http://lh4.ggpht.com/-dqcpldWgX4U/Um3UTxznLhI/AAAAAAAAI1E/-MaxPD9Lf9w/image_thumb%25255B45%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="505" /></a>
<br />
<h2>
Expression Usage</h2>
This reusable transformation takes one input port and gives one output port. The input port should be a date timestamp with time zone information. Below shown is a mapping using this reusable transformation.<br />
<a href="http://lh3.ggpht.com/-GVLwu-9SQQ8/Um31sguDpHI/AAAAAAAAI1U/xtX3r-sQdMo/s1600-h/image%25255B60%25255D.png"><img alt="Time Zones Conversion and Standardization Using Informatica PowerCenter" border="0" height="193" src="http://lh4.ggpht.com/--mpcawllCAM/Um31uOgKnpI/AAAAAAAAI1c/pO9Kyfi1kdA/image_thumb%25255B55%25255D.png?imgmax=800" style="border-style: none; border-width: 0px; display: block; float: none; margin: 10px auto 0px;" title="" width="664" /></a>
<br />
<span style="color: #cc0000;"><b>Note</b></span> : Timestamp with
time zone is processed as STRING(36) data type in the mapping. All the
transformations should use STRING(36) data type. Source and target
should use VARCHAR2(36) data type.
<br />
<h2>
Download</h2>
You can download the reusable expression we discussed in this article. <a href="https://gumroad.com/l/tTJT">Click here</a> for the download link. <br />
<br />
Hope this tutorial was helpful and useful for your project. Please leave you questions and commends, We will be more than happy to help you. </div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-48112061224981161612013-10-16T23:43:00.001-07:002013-10-16T23:56:56.077-07:00Dynamically Changing ETL Calculations Using Informatica Mapping Variable<img align="left" alt="Informatica SQL Transformation" border="0" height="100" src="http://4.bp.blogspot.com/-ELYtGWnv-v0/Ul88rgovlCI/AAAAAAAAIy0/kNrYYGK0Yrk/s1600/dynamicicon.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div style="text-align: justify;">
Quite often we deal with ETL logic, which is very <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html" target="_blank">dynamic</a> in nature. Such as a discount calculation which changes every month or a special weekend only logic. There is a lot of practical difficulty in making such frequent ETL change into production environment. Best option to deal with this dynamic scenario is parametrization. In this article let discuss how we can make the ETL <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html" target="_blank">calculations dynamic</a>.<br />
<a name='more'></a><h2>
Business Use Case</h2>
Lets start our discuss with the help of a real life use case.<br />
<br />
The sales department wants to build a monthly sales fact table. The fact table need to be refreshed after the month end closure. Sales commission is one of the fact table data element, its calculation is dynamic in nature. It is a factor of sales or sales revenue or net sales.<br />
<br />
Sales Commission calculation can be :<br />
<ul><ul><ol>
<li><span style="color: #cc0000;">Sales Commission = Sales * 18 / 100</span></li>
<li><span style="color: #cc0000;">Sales Commission = Sales Revenue * 20 / 100 </span></li>
<li><span style="color: #cc0000;">Sales Commission = Net Sales * 20 / 100</span></li>
</ol>
</ul>
</ul>
<div>
<span style="color: #cc0000;">Note</span> : The expression calculation can be as complex as the business requirement demands.</div>
<div>
<br /></div>
The calculation need to be used by the month end ETL will be decided by the Sales Manager before the month ETL load.<br />
<h2>
Mapping Configuration</h2>
<div>
Now we understand the use case, lets build the mapping logic. </div>
<div>
<br /></div>
<div>
Here we will be building the dynamic sales commission calculation logic with the help of a <span style="color: #cc0000;">mapping variable</span>. The changing expression for the calculation will be passed into the mapping using a session parameter file.</div>
<div>
<br /></div>
<div>
<b>Step 1</b> : As the first step, Create a mapping variable $$EXP_SALES_COMM and set the <i><span style="color: #cc0000;"><b>isExpVar</b></span></i> property TRUE as shown in below image.</div>
<div>
<a href="http://lh5.ggpht.com/-a0NU76y7qWQ/Ul9-n7QiaxI/AAAAAAAAIzE/ooL_-cpZB-U/s1600-h/image%25255B34%25255D.png"><img alt="Dynamically Changing Calculations Using Informatica Mapping Parameter" border="0" height="387" src="http://lh6.ggpht.com/-pDT_Y2G4QlU/Ul9-paEpStI/AAAAAAAAIzM/TgYfQ3oKoa4/image_thumb%25255B28%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="518" /></a>
<span style="color: #cc0000;">Note </span>: Precision for the mapping variable should be big enough to hold the whole expression.<br />
<br />
<b>Step 2</b> : In an expression transformation, create an output port and provide the mapping variable as the expression. Below shown is the screenshot of expression transformation.<span style="color: #0000ee;"><u><br /></u></span>
<a href="http://lh6.ggpht.com/-hyq9aD4ItQE/Ul9-qDPWuQI/AAAAAAAAIzU/rWLn0lTfCBE/s1600-h/image%25255B26%25255D.png"><span style="color: #3366cc;"></span><img alt="Dynamically Changing Calculations Using Informatica Mapping Parameter" border="0" height="399" src="http://lh3.ggpht.com/-tsR-9g_GjUU/Ul9-s1EkwBI/AAAAAAAAIzc/Ycm6XISXTKQ/image_thumb%25255B20%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="538" /></a>
<span style="color: #cc0000;">Note</span> : All the ports used in the expression $$EXP_SALES_COMM should be available as an input or input/output port in the expression transformation.</div>
<div>
<h2>
Workflow Configuration</h2>
</div>
<div>
In the workflow configuration, we will create the parameter file with the expression for Sales Commission and set up in the session.</div>
<div>
<br /></div>
<div>
<b>Step 1</b> : Create the session parameter file with the expression for Sales Commission calculation with the below details.</div>
<div>
<div>
<blockquote class="tr_bq">
<span style="color: #cc0000;">[s_m_LOAD_SALES_FACT]</span><br />
<span style="color: #cc0000;">$$EXP_SALES_COMM=SALES_REVENUE*20/100</span></blockquote>
</div>
<div style="color: #cc0000;">
<b style="color: black;">Step 2</b><span style="color: black;"> </span><span style="color: black;">:</span><span style="color: black;"> Set the parameter in the session properties as shown below.</span></div>
<div>
<a href="http://lh3.ggpht.com/-rb1PJ71Yiwc/Ul9-t0Ko1OI/AAAAAAAAIzk/OiygrOlJ59Q/s1600-h/image%25255B33%25255D.png"><img alt="Dynamically Changing Calculations Using Informatica Mapping Parameter" border="0" height="480" src="http://lh3.ggpht.com/-A02pIC8b0FI/Ul9-vKuPUQI/AAAAAAAAIzs/Og7d_jd1McY/image_thumb%25255B27%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="588" /></a>
With that we are done with the configuration. You can update the expression in the parameter file when ever a change is required in the sales commission calculation. This clearly eliminate the need of a ETL code change.<br />
<br /></div>
Hope you enjoyed this article, please leave us a comment or feedback if you have any, we are happy to hear from you.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-9111985707933599872013-10-08T23:38:00.000-07:002013-12-05T17:30:21.597-08:00Informatica Performance Tuning Guide, Resolve Performance Bottlenecks - Part 3<img align="left" alt="Informatica Performance Tuning Guide" border="0" height="100" src="http://1.bp.blogspot.com/-V9P7O4RQysY/UlGvax-CLoI/AAAAAAAAIw8/F8xkaODpdOM/s1600/informatica-performance-improvement.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div>
<div style="text-align: justify;">
In our previous article in the <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html" target="_blank">performance tuning</a> series, we covered different approaches to identify performance bottlenecks. In this article we will cover the methods to resolve different performance bottlenecks. We will talk about session memory, cache memory, source, target and mapping performance turning techniques in detail.</div>
<a name='more'></a><h1>
<span style="color: #cc0000;">I. Buffer Memory Optimization</span></h1>
<div>
<div style="text-align: justify;">
When the Integration Service initializes a session, it allocates blocks of memory to hold source and target data. Sessions that use a large number of sources and targets might require additional memory blocks.
<br />
<div class="sticky taped" style="background-position: initial initial; background-repeat: initial initial; float: right;">
<b>Performance Tuning Tutorial Series</b><br />
Part I : <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html">Performance Tuning Introduction.</a> <br />
Part II : <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html">Identify Performance Bottlenecks. </a><br />
Part III : <a href="http://www.disoln.org/2013/10/Informatica-Performance-Tuning-Guide-Resolve-Performance-Bottlenecks-Part-3.html">Remove Performance Bottlenecks</a>.<br />
Part IV : <a href="http://www.disoln.org/2013/11/Informatica-Performance-Tuning-Guide-Performance-Enhancement-Features-Part-4.html">Performance Enhancements</a>.</div>
<div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Not having enough <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html#Anatomy" target="_blank">buffer memory</a> for <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html#Anatomy" target="_blank">DTM process</a>, can slowdown reading, transforming or writing and cause large fluctuations in performance. Adding extra memory blocks can keep the threads busy and improve session performance. You can do this by adjusting the buffer block size and <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html#Anatomy" target="_blank">DTM Buffer</a> size.<br />
<br />
<b><span style="color: #cc0000;">Note</span></b> : You can identify DTM buffer bottleneck from Session Log File, <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html#sesslog" target="_blank">Check here for details</a>.</div>
<h3 style="text-align: start;">
1. Optimizing the Buffer Block Size</h3>
<div style="text-align: justify;">
Depending on the source, target data, you might need to increase or decrease the buffer block size. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To identify the optimal buffer block size, sum up the precision of individual source and targets columns. The largest precision among all the source and target should be the buffer block size for one row. Ideally, a buffer block should accommodates at least 100 rows at a time.</div>
<div style="text-align: justify;">
<ul><ul><ul>
<li><span style="color: #cc0000;">Buffer Block Size = Largest Row Precision * 100</span></li>
</ul>
</ul>
</ul>
</div>
You can change the buffer block size in the session configuration as shown in below image.<!--EndFragment--></div>
<a href="http://lh6.ggpht.com/-yqFx6ZuIiA8/UleUxhhwC4I/AAAAAAAAIxg/n3_Kjz6SpOI/s1600-h/image%25255B46%25255D.png"><img alt="Informatica session DTM buffer block size" border="0" height="480" src="http://lh4.ggpht.com/-Rjf4JpGFBy8/UleUynY-tuI/AAAAAAAAIxo/_YJVuKJVdtQ/image_thumb%25255B36%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="588" /></a>
<br />
<h3>
2. Increasing DTM Buffer Size</h3>
<div style="text-align: justify;">
When you increase the DTM buffer memory, the Integration Service creates more buffer blocks, which improves performance. You can identify the required DTM Buffer Size based on below calculation.</div>
<div style="text-align: justify;">
<ul><ul><ul>
<li><span style="color: #cc0000;">Session Buffer Blocks = (total number of sources + total number of targets) * 2 </span></li>
<li><span style="color: #cc0000;">DTM Buffer Size = Session Buffer Blocks * Buffer Block Size / 0.9</span></li>
</ul>
</ul>
</ul>
</div>
</div>
<div>
You can change the DTM Buffer Size in the session configuration as shown in below image.
<a href="http://lh4.ggpht.com/-6EzBFwjaOy0/UleUvxONjlI/AAAAAAAAIxQ/_hXmc7Ky6nA/s1600-h/image%25255B48%25255D.png"><img alt="Informatica session DTM buffer size" border="0" height="480" src="http://lh5.ggpht.com/-VJViJZZYw_I/UleUw9tqRII/AAAAAAAAIxY/EefCgsrhWZA/image_thumb%25255B38%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="591" /></a>
<br />
<h1>
<span style="color: #cc0000;">II. Caches Memory Optimization</span></h1>
</div>
<div style="text-align: justify;">
Transformations such as Aggregator, Rank, Lookup uses cache memory to store transformed data, which includes index and data cache. If the allocated cache memory is not large enough to store the data, the Integration Service stores the data in a temporary cache file. Session performance slows each time the Integration Service reads from the temporary cache file. </div>
<div>
<br /></div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Note</span></b> : You can examine the performance counters to determine what all transformations require cache memory turning, <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html#session" target="_blank">Check here for details.</a></div>
<h3>
1. Increasing the Cache Sizes </h3>
<div style="text-align: justify;">
You can increase the allocated cache sizes to process the transformation in cache memory itself such that the integration service do not have to read from the cache file. </div>
<div style="text-align: justify;">
<br /></div>
<div>
<div>
<div style="text-align: justify;">
You can calculate the memory requirements for a transformation using the Cache Calculator. Below shown is the Cache Calculator for Lookup transformation.</div>
<a href="http://lh4.ggpht.com/-yZwOimoFyMw/UleUzcT3BOI/AAAAAAAAIxw/uRXT_-XK-AM/s1600-h/image%25255B41%25255D.png"><img alt="Cache Calculator for informatica LookUP transformation" border="0" height="480" src="http://lh3.ggpht.com/-Y1ULP0XInFQ/UleU0dcyPYI/AAAAAAAAIx4/EF25COBkT8E/image_thumb%25255B31%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="608" /></a><span style="text-align: justify;">You can update the cache size in the session property of the transformation as shown below.</span>
<a href="http://lh4.ggpht.com/-Xxg332tVtHQ/Ulebdkp6c3I/AAAAAAAAIyY/h_lRcRcnfYQ/s1600-h/image%25255B58%25255D.png"><img alt="informatica transformation cache memory calculation" border="0" height="480" src="http://lh3.ggpht.com/-8Wr9v_0IiGE/UlebejrJZ3I/AAAAAAAAIyg/ME_bdZowXSU/image_thumb%25255B46%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -20px;" title="" width="587" /></a>
<br />
<h3>
2. Limiting the Number of Connected Ports</h3>
</div>
<div>
For transformations that use data cache, limit the number of connected input/output and output only ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.<br />
<h1>
<span style="color: #cc0000;">III. Optimizing the Target</span></h1>
<div style="text-align: justify;">
The most common performance bottleneck occurs when the Integration Service writes to a target database. Small database checkpoint intervals, small database network packet sizes, or problems during heavy loading operations can cause target bottlenecks.</div>
<div style="text-align: justify;">
<br /></div>
<div>
<span style="text-align: justify;"><b><span style="color: #cc0000;">Note</span></b> : Target bottleneck can be determined with the help of Session Log File,</span> <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html#mapping" target="_blank">check here for details</a>.</div>
<h3>
1. Using Bulk Loads</h3>
</div>
<div>
<div style="text-align: justify;">
You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database. When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery.</div>
<h3>
2. Using External Loaders</h3>
</div>
<div style="text-align: justify;">
To increase session performance, configure PowerCenter to use an external loader for the following types of target databases. External loader can be used for Oracle, DB2, Sybase and Teradata.</div>
<h3>
3. Dropping Indexes and Key Constraints</h3>
<div style="text-align: justify;">
When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before you run the session. You can rebuild those indexes and key constraints after the session completes.</div>
<div>
<h3>
4. Minimizing Deadlocks</h3>
</div>
<div style="text-align: justify;">
Encountering deadlocks can slow session performance. You can increase the number of target connection groups in a session to avoid deadlocks. To use a different target connection group for each target in a session, use a different database connection name for each target instance.</div>
<div style="text-align: justify;">
<h3 style="text-align: start;">
5. Increasing Database Checkpoint Intervals</h3>
<div>
The Integration Service performance slows each time it waits for the database to perform a checkpoint. To decrease the number of checkpoints and increase performance, increase the checkpoint interval in the database. </div>
</div>
<div>
<h3>
6. Increasing Database Network Packet Size</h3>
<div style="text-align: justify;">
If you write to Oracle, Sybase ASE, or Microsoft SQL Server targets, you can improve the performance by increasing the network packet size. Increase the network packet size to allow larger packets of data to cross the network at one time.</div>
<h1>
<span style="color: #cc0000;">IV. Optimizing the Source</span></h1>
<div>
<div style="text-align: justify;">
Performance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query or small database network packet sizes can cause source bottlenecks.<br />
<br /></div>
</div>
<div>
<span style="text-align: justify;"><b><span style="color: #cc0000;">Note</span></b> : Session Log File details can be used to identify Source bottleneck,</span> <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html#mapping" target="_blank">check here for details</a>.</div>
<h3>
1. Optimizing the Query </h3>
<div style="text-align: justify;">
If a session joins multiple source tables in one Source Qualifier, you might be able to improve performance by optimizing the query with optimizing hints. Usually, the database optimizer determines the most efficient way to process the source data. However, you might know properties about the source tables that the database optimizer does not. The database administrator can create optimizer hints to tell the database how to execute the query for a particular set of source tables. </div>
<h3>
2. Increasing Database Network Packet Size </h3>
<div style="text-align: justify;">
If you read from Oracle, Sybase ASE, or Microsoft SQL Server sources, you can improve the performance by increasing the network packet size. Increase the network packet size to allow larger packets of data to cross the network at one time.</div>
<h1>
<span style="color: #cc0000;">
V. Optimizing the Mappings</span></h1>
<div style="text-align: justify;">
Mapping-level optimization may take time to implement, but it can significantly boost session performance. Focus on mapping-level optimization after you optimize the targets and sources. </div>
<div style="text-align: justify;">
<br /></div>
<div>
Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. Configure the mapping with the least number of transformations and expressions to do the most amount of work possible. Delete unnecessary links between transformations to minimize the amount of data moved.<br />
<br />
<span style="text-align: justify;"><b><span style="color: #cc0000;">Note</span></b> : You can identify Mapping bottleneck from Session Log File,</span> <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html#mapping" target="_blank">check here for details</a>.<br />
<h3>
1. Optimizing Datatype Conversions </h3>
</div>
</div>
</div>
<div style="text-align: justify;">
You can increase performance by eliminating unnecessary datatype conversions. For example, if a mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions from mappings.</div>
<h3>
2. Optimizing Expressions </h3>
<div style="text-align: justify;">
You can also optimize the expressions used in the transformations. When possible, isolate slow expressions and simplify them.</div>
<ul>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Factoring Out Common Logic</span> :</b> If the mapping performs the same task in multiple places, reduce the number of times the mapping performs the task by moving the task earlier in the mapping.</li>
<li><b style="text-align: justify;"><span style="color: #cc0000;">Minimizing Aggregate Function Calls</span> :</b><span style="text-align: justify;"> </span><span style="text-align: justify;">When writing expressions, factor out as many aggregate function calls as possible. Each time you use an aggregate function call, the Integration Service must search and group the data. For example </span><span style="text-align: justify;">SUM(COL_A + COL_B) performs better than SUM(COL_A) + SUM(COL_B)</span></li>
<li><b style="text-align: justify;"><span style="color: #cc0000;">Replacing Common Expressions with Local Variables</span> : </b><span style="text-align: justify;">If you use the same expression multiple times in one transformation, you can make that expression a local variable.</span></li>
<li><b style="text-align: justify;"><span style="color: #cc0000;">Choosing Numeric Versus String Operations</span> :</b><span style="text-align: justify;"> The Integration Service processes numeric operations faster than string operations. For example, if you look up large amounts of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.</span></li>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Using Operators Instead of Functions </span>: </b>The Integration Service reads expressions written with operators faster than expressions with functions. Where possible, use operators to write expressions.</li>
</ul>
<h3>
3. Optimizing Transformations</h3>
<div style="text-align: justify;">
Each transformation is different and the tuning required for different transformation is different. But generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the transformation.<br />
<br />
<b><span style="color: #cc0000;">Note </span></b>: Tuning technique for different transformation will be covered as a separate article.</div>
<h2>
What is Next in the Series</h2>
<div style="text-align: justify;">
The <a href="http://www.disoln.org/2013/11/Informatica-Performance-Tuning-Guide-Performance-Enhancement-Features-Part-4.html">next article</a> in this series will cover the additional features available in Informatica PowerCenter to improve session performance. Hope you enjoyed this article, please leave us a comment or feedback if you have any, we are happy to hear from you.</div>
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-36008708213713121012013-09-30T22:37:00.000-07:002013-10-27T18:19:19.179-07:00Informatica HTTP Transformation, The Interface Between ETL and Web Services <img align="left" alt="Informatica SQL Transformation" border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzUL_zSiHCTq73bRsjCdILENqhvKvza0wFDFX4AUWN2Vyrz5rDh7nmPE59nkZCppUhYSGGxAoVHNncohcuVby6Qv0zcL3MAb9eEhNFV8ZBTgMn7qYgLvw1nXXucw_Be_Uq9Z0iF8yQJos/s320/http-logo.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" />
<br />
<div style="text-align: justify;">
In a matured data warehouse environment, you will see all sorts of <a href="http://www.disoln.org/2012/07/understand-powercenter-source-analyzer.html">data sources</a>, like Mainframe, ERP, Web Services, Machine Logs, Message Queues, Hadoop etc. Informatica has provided a variety of connector to get data extracted from such data sources. Using Informatica HTTP <a href="http://www.disoln.org/search/label/Transformations?max-results=15">transformation</a>, you can make <a href="http://en.wikipedia.org/wiki/Web_service">Web Service</a> calls and get data from web servers. We will have this transformation explained in this article with a use case.</div>
<div>
<a name='more'></a><h2>
What is HTTP Transformation</h2>
</div>
<div style="text-align: justify;">
The HTTP transformation enables you to connect to an <a href="http://en.wikipedia.org/wiki/Web_services" rel="nofollow" target="_blank">HTTP server</a> to use its <a href="http://en.wikipedia.org/wiki/Web_service" rel="nofollow" target="_blank">services and applications</a>. When you run a session with an HTTP transformation, the Integration Service connects to the HTTP server and issues a request to retrieve data from or update data on the HTTP server.<br />
<br />
<span style="color: #cc0000;"><i>For example</i></span>, you can get the currency conversion rate between USD and EUR by calling this web service call. <i><a href="http://rate-exchange.appspot.com/currency?from=USD&to=EUR" rel="nofollow" target="_blank">http://rate-exchange.appspot.com/currency?from=USD&to=EUR</a></i>
Using HTTP Transformation you can :</div>
<div>
<div style="text-align: justify;">
<ol>
<li><span style="color: #cc0000;"><b>Read </b></span><b><b><span style="color: #cc0000;">data </span></b></b><span style="color: #cc0000;"><b>from an HTTP server </b></span><b>:-</b><span style="color: #cc0000;"> </span>It retrieves data from the HTTP server and passes the data to a downstream transformation in the mapping.</li>
<li><b><b><span style="color: #cc0000;">Update data on the HTTP server </span></b>:- </b>It posts data to the HTTP server and passes HTTP server responses to a downstream transformation in the mapping.</li>
</ol>
</div>
</div>
<div>
<h2>
Developing HTTP Transformation</h2>
</div>
<div>
<div class="p1" style="text-align: justify;">
Like any other transformation, you can create HTTP transformations in the Transformation Developer or in the Mapping Designer. <span style="text-align: justify;">As shown in below image, all the configuration required for this transformation in on the HTTP tab.</span></div>
<a href="http://lh3.ggpht.com/-Lazwu6rggd0/UkuizOH3LPI/AAAAAAAAIvE/vSH8K4McCP8/s1600-h/image%25255B19%25255D.png"><img alt="Informatica HTTP Transformation" border="0" height="501" src="http://lh3.ggpht.com/-AEfXbctrgWs/Ukui0EuvwVI/AAAAAAAAIvM/HjhxNiJa-N0/image_thumb%25255B15%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 0px;" title="" width="500" /></a>
<br />
<h3>
Read or Write data to HTTP server</h3>
<div class="p1" style="text-align: justify;">
As shown in the image, on the HTTP tab, you can configure the transformation to read data or write data to the HTTP server. Select GET method to read data and POST or SIMPLE POST method to write data to an HTTP server.</div>
<div class="p1" style="text-align: justify;">
</div>
<h3>
Configuring Groups and Ports</h3>
<div>
Base on the type of the HTTP method, you choose and the port group and port in the transformation in the HTTP tab.. </div>
<div>
<div class="p1">
</div>
<ul>
<li><b><span style="color: #cc0000;">Output</span></b>. Contains data from the HTTP response. Passes responses from the HTTP server to downstream transformations. </li>
<li><b><span style="color: #cc0000;">Input</span>.</b> Used to construct the final URL for the GET method or the data for the POST request.</li>
<li><b><span style="color: #cc0000;">Header</span></b>. Contains header data for the request and response.</li>
</ul>
<div style="text-align: justify;">
In the above shown image, we have two input ports for the GET method and the response from the server as the output port</div>
<div class="p1">
</div>
<div class="p1">
</div>
</div>
<div class="p1" style="text-align: justify;">
</div>
<h3>
Configuring a URL</h3>
</div>
<div>
<div style="text-align: justify;">
The web service will be accessed using a URL and the base URL of the web service need to be provided in the transformation. The Designer constructs the final URL for the GET method based on the base URL and port names in the input group.</div>
</div>
<div>
<div class="p1">
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
In the above shown image, you can see the base url and the constructed URL, which includes the query parameters. This web service call is to get the currency conversion and we are passing two parameters to the base url, "<i>from</i>" and "<i>to</i>" currency.</div>
<h3>
Connecting to the HTTP Server</h3>
<div>
<div class="p1">
If the HTTP server requires authentication, you can create an HTTP connection object in the <a href="http://www.disoln.org/2012/09/understand-informatica-powercenter-Workflow-Designer.html" target="_blank">Workflow Manager</a>. This connection can be used in the session configuration to connect the HTTP server.<br />
<a href="http://www.blogger.com/blogger.g?blogID=6593582717363578994" name="UseCase"></a>
<h2>
HTTP Transformation Use Case</h2>
<div>
<div style="text-align: justify;">
Lets consider an ETL job, which is used to integrate sales data from different global sales regions in to the enterprise data warehouse. Data in the warehouse needs to be standardized and all the sales figure need to be stored in US Dollars (USD).</div>
</div>
<div>
<br /></div>
<div style="text-align: justify;">
<span style="color: #cc0000;"><b>Solution</b></span> : Here in the ETL process lets us use a web service call to get the real time currency conversion rate and convert the foreign currency to USD. We will use HTTP Transformation to call the web service.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For the demo, we will concentrate only on the HTTP transformation. We will be using the web service from http://rate-exchange.appspot.com/ for the demonstration. This web service take two parameters, "<i>from currency</i>" and "<i>to currency</i>" and returns a <a href="http://en.wikipedia.org/wiki/JSON" rel="nofollow" target="_blank">JSON document</a>, with the exchange rate information.</div>
<div style="text-align: justify;">
<br /></div>
<div>
<a href="http://rate-exchange.appspot.com/currency?from=USD&to=EUR" rel="nofollow" target="_blank">http://rate-exchange.appspot.com/currency?from=USD&to=EUR</a><br />
<br />
<div style="text-align: justify;">
<b>Step 1 :-</b> Create the HTTP Transformation like any other transformation in the mapping designer. We need to configure the transformation for the GET HTTP method to access currency conversion data. Below shown is the configuration.</div>
</div>
</div>
</div>
</div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://lh3.ggpht.com/-s2QGaO3FPZA/Uku-uEYLVWI/AAAAAAAAIvc/rrs7Lus-1gM/s1600-h/image%25255B65%25255D.png"><img alt="Informatica HTTP Transformation" border="0" height="499" src="http://lh4.ggpht.com/-W4AshZlOEpY/Uku-vTV_NrI/AAAAAAAAIvk/BjipEHahCsE/image_thumb%25255B51%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="500" /></a></div>
<div style="text-align: justify;">
<b>Step 2 :-</b> Create two input ports as shown in below image. The ports need to be string data type and the port name should match with the url parameter name.</div>
<a href="http://lh3.ggpht.com/-YgJrELTkhWo/Uku-wZS-YdI/AAAAAAAAIvs/kgyMA1Wn34A/s1600-h/image%25255B64%25255D.png"><img alt="Informatica HTTP Transformation" border="0" height="499" src="http://lh5.ggpht.com/-eA7ImZ2pUCA/Uku-xpkPfRI/AAAAAAAAIv0/CbKT3D7AL6s/image_thumb%25255B50%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="500" /></a>
<br />
<div style="text-align: justify;">
<b>Step 3 :-</b> Now you can provide the base URL for the web service and the designer will construct the complete URL with the parameters included.</div>
<a href="http://lh3.ggpht.com/-0RWN_sPpwd0/Uku-ymot2BI/AAAAAAAAIv8/BFqcxrF7Svo/s1600-h/image%25255B63%25255D.png"><img alt="Informatica HTTP Transformation" border="0" height="500" src="http://lh3.ggpht.com/-WmCVvVa8Fes/Uku-zzScXBI/AAAAAAAAIwE/XKhXJnVhAGo/image_thumb%25255B49%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="500" /></a>
<br />
<div style="text-align: justify;">
<b>Step 4 :-</b> The output from the HTTP transformation will look similar to what is given below.</div>
<div style="text-align: justify;">
<span style="text-align: start; white-space: pre-wrap;"><br /></span></div>
<div style="text-align: center;">
<span style="text-align: start; white-space: pre-wrap;"><i>{"to": "USD", "rate": 1.3522000000000001, "from": "EUR"}</i></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="text-align: start; white-space: pre-wrap;"><span style="text-align: justify; white-space: normal;">Finally, you can plug in the transformation into the mapping as shown in below image. Parse the output from HTTP Transformation in an expression transformation and do the calculation to convert the currency to USD.</span></span></div>
<a href="http://lh4.ggpht.com/-9ekB2v3GqFQ/Uku-0hNCZlI/AAAAAAAAIwM/XDjOEerIYA4/s1600-h/image%25255B62%25255D.png"><img alt="Informatica HTTP Transformation" border="0" height="190" src="http://lh6.ggpht.com/-2xfwUn3O-rY/Uku-1yBvR2I/AAAAAAAAIwU/VznjTRhwnAc/image_thumb%25255B48%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="700" /></a>
<br />
<div style="text-align: justify;">
Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out HTTP transformation or share us if you use any different use cases you want to implement using HTTP transformation.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-70332110413553416692013-09-24T00:03:00.000-07:002013-10-07T07:57:40.454-07:00Informatica SQL Transformation, SQLs Beyond Pre & Post Session Commands<img align="left" alt="Informatica SQL Transformation" border="0" height="100" src="http://4.bp.blogspot.com/-raB4qMWq9F0/UkOTDV4nOEI/AAAAAAAAIrw/bvq8aOxdP0Y/s1600/SQL.jpg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px;" title="" width="100" /><br />
<div>
<div style="text-align: justify;">
SQL statements can be used as part of pre or post SQL commands in a <a href="http://www.disoln.org/2012/09/understand-informatica-powercenter-Workflow-Designer.html" target="_blank">PowerCenter workflow</a>. These are static SQLs and can run only once before or after the mapping pipeline is run. With the help of SQL transformation, we can use SQL statements much more effectively to build your ETL logic. In this tutorial lets learn more about the <a href="http://www.disoln.org/search/label/Transformations?max-results=15" target="_blank">transformation</a> and its usage with a real time use case.</div>
<a name='more'></a><h2 style="text-align: justify;">
What is SQL Transformation</h2>
<div>
<div class="p1">
<div style="text-align: justify;">
The SQL transformation can be used to processes SQL queries midstream in a mapping. You can execute any valid SQL statement using this transformation. This can be external SQL scripts or SQL queries that are created with in the transformation. SQL transformation processes the query and returns rows and database errors if any.</div>
<h2>
Configuring SQL Transformation</h2>
</div>
<div class="p1">
<div style="text-align: justify;">
SQL transformation can run in two different modes.</div>
</div>
<ul>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Script mode</b></span> :- Runs SQL scripts from text files that are externally located. You pass a script name to the transformation with each input row. It outputs script execution status and any script error. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Query mode</b></span> :- Executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries. You can output multiple rows when the query has a SELECT statement.</li>
</ul>
<h3 style="text-align: justify;">
Script Mode</h3>
</div>
<div style="text-align: justify;">
An SQL transformation running in script mode runs SQL scripts from text files. It creates an SQL procedure and sends it to the database to process. The database validates the SQL and executes the query. You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In the script mode, you pass script file name with the complete path from the source to the SQL transformation <i><span style="color: #cc0000;">ScriptName</span></i> port. <i><span style="color: #cc0000;">ScriptResult</span></i> port gives the status of the script execution status. It will be either PASSED or FAILED. <i><span style="color: #cc0000;">ScriptError </span></i>returns errors that occur when a script fails for a row. </div>
<a href="http://lh4.ggpht.com/-EWOxgA-EOqI/UkUPYITu1cI/AAAAAAAAIsA/h9zYNPcQeIo/s1600-h/image%25255B16%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="144" src="http://lh3.ggpht.com/-yQ5BxFfrTjk/UkUPZJwhHgI/AAAAAAAAIsI/4JWO_68bSCE/image_thumb%25255B12%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="164" /></a>
<br />
<div style="text-align: justify;">
Above shown is an SQL transformation in Script Mode, which will have a ScriptName input and ScripResult, ScriptError as output.</div>
<h3>
Query Mode</h3>
</div>
<div>
<div class="p1" style="text-align: justify;">
When SQL transformation runs in query mode, it executes an SQL query defined in the transformation. You can pass strings or parameters to the query from the transformation input ports to change the SQL query statement or the query data. The SQL query can be static or dynamic.</div>
<ul>
<li style="text-align: justify;"><span style="color: #cc0000;">Static SQL query</span> :- The query statement does not change, but you can use query parameters to change the data, which is passed in through the input ports of the transformation. </li>
<li style="text-align: justify;"><span style="color: #cc0000;">Dynamic SQL query</span> :- You can change the query statements and the data, which is passed in through the input ports of the transformation.</li>
</ul>
<div style="text-align: justify;">
With static query, the Integration Service prepares the SQL statement once and executes it for each row. With a dynamic query, the Integration Service prepares the SQL for each input row.</div>
<a href="http://lh5.ggpht.com/-x7GUULHGrFc/UkUPZg5RgiI/AAAAAAAAIsQ/Xg7xYfYhsrE/s1600-h/image%25255B21%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="160" src="http://lh6.ggpht.com/-9LhCU_dSI8A/UkUPavuCqSI/AAAAAAAAIsY/4Yuy0E0Wz7Y/image_thumb%25255B17%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="189" /></a>
<br />
<div style="text-align: justify;">
Above shown SQL transformation, which runs in query mode has two input parameters and returns one output.</div>
<h2>
<span style="text-align: justify;">SQL Transformation Use Case</span></h2>
</div>
<div>
<div style="text-align: justify;">
Lets consider the ETL for loading <a href="http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html" target="_blank">Dimension tables</a> into a data warehouse. The surrogate key for each of the dimension tables are populated using an <a href="http://docs.oracle.com/cd/B12037_01/server.101/b10759/statements_6014.htm" rel="nofollow" target="_blank">Oracle Sequence</a>. The ETL architect needs to create an <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html" target="_blank">Informatica reusable component</a>, which can be reused in different <a href="http://www.disoln.org/2013/01/slowly-changing-dimension-type-1-implementation-using-informatica-powercenter.html" target="_blank">dimension table loads</a> to populate the surrogate key.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b><span style="color: #cc0000;">Solution</span></b> : Here lets create a reusable SQL transformation in Query mode, which can take the name of the oracle sequence generator, and pass the sequence number as the output.<br />
<br />
<b>Step 1 :-</b> Once you have the transformation developer open you can start creating the SQL transformation like any other transformations. It opens up a window like shown in below image.</div>
</div>
<a href="http://lh6.ggpht.com/-8ntpgakvMIk/UkUd-2wt7kI/AAAAAAAAIso/uj0ZmXhvzjo/s1600-h/image%25255B84%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="424" src="http://lh6.ggpht.com/-iPb4ATonOtE/UkUd_5-VUfI/AAAAAAAAIsw/1Y13Rq8rXIo/image_thumb%25255B68%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="431" /></a><br />
<div style="text-align: justify;">
This screen will let you choose the mode, database type, database connection type and you can make the <span style="color: #cc0000;">transformation active or passive</span>. If the database connection type is <span style="color: #cc0000;">dynamic</span>, you can dynamically pass in the connection details into the transformation. If the SQL query returns more than one record, you need to make the transformation active.</div>
<div style="text-align: justify;">
<span style="color: #cc0000;"><br /></span></div>
<div style="text-align: justify;">
<b>Step 2 :-</b> Now create the input and output ports as shown in the below image. We are passing in the database schema name and the sequence name. It return sequence number as an output port.</div>
<div style="text-align: justify;">
</div>
<a href="http://lh4.ggpht.com/-rD-y4CosxWY/UkUeAgcM0GI/AAAAAAAAIs4/iKm-oTuShyw/s1600-h/image%25255B57%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="546" src="http://lh4.ggpht.com/--hvCiMCUp60/UkUeBkCw-DI/AAAAAAAAItA/ITz4_ucBn6w/image_thumb%25255B45%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="609" /></a><br />
<div style="text-align: justify;">
<b>Step 3 :-</b> Using the SQL query editor, we can build the query to get the sequence generator. Using the '<span style="color: #cc0000;">String Substitution</span>' ports we can make the SQL dynamic. Here we are making the query dynamic by passing the schema name, sequence name dynamically as an input port.</div>
<a href="http://lh3.ggpht.com/-Exruk7x7dmU/UkUeCePM1AI/AAAAAAAAItI/SzPBD6b_zlA/s1600-h/image%25255B90%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="548" src="http://lh5.ggpht.com/-__rOAugQh6o/UkUeDOCvoDI/AAAAAAAAItQ/72ZC7r-0b1I/image_thumb%25255B74%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="612" /></a><br />
<div style="text-align: justify;">
That is all we need for the reusable SQL transformation. Below shown is the completed SQL transformation, which can take two input values (schema name, sequence name) and returns one output value (sequence number).</div>
<a href="http://lh6.ggpht.com/-8ntpgakvMIk/UkUd-2wt7kI/AAAAAAAAIso/uj0ZmXhvzjo/s1600-h/image%25255B84%25255D.png"></a><a href="http://lh5.ggpht.com/-x7GUULHGrFc/UkUPZg5RgiI/AAAAAAAAIsQ/Xg7xYfYhsrE/s1600-h/image%25255B21%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="160" src="http://lh6.ggpht.com/-9LhCU_dSI8A/UkUPavuCqSI/AAAAAAAAIsY/4Yuy0E0Wz7Y/image_thumb%25255B17%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="189" /></a><br />
<div style="text-align: justify;">
<b>Step 4 :-</b> We can use this transformation just like any other <a href="http://www.disoln.org/2012/10/11-ways-to-make-informatica-powercenter-code-reusable.html" target="_blank">reusable transformations</a>, Need to pass in the schema name, sequence name as input ports and returns sequence number, which can be used to populate the surrogate key of the dimension table as shown below.</div>
<a href="http://lh3.ggpht.com/-GeQ-E0kHC_8/UkUn-0rmZwI/AAAAAAAAItw/3xbjw8RvwUs/s1600-h/image%25255B102%25255D.png"><img alt="Informatica SQL Transformation, SQL Queries Beyond Pre & Post SQL Commands" border="0" height="169" src="http://lh3.ggpht.com/-2s12OIrhyEU/UkUn_81eNlI/AAAAAAAAIt4/MsHlcDilums/image_thumb%25255B84%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="700" /></a><br />
<div style="text-align: justify;">
As per the above example, integration service will convert the SQL as follows during the session runtime. <i style="color: #cc0000; text-align: center;">SELECT DW.S_CUST_DIM.NEXTVAL FROM DUAL;</i><br />
<div style="text-align: center;">
<span style="color: #cc0000;"><i><br /></i></span></div>
Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out this tutorial or share us if you use any different use cases you want to implement using SQL transformation.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-73338089116390074252013-09-17T22:16:00.001-07:002013-09-25T22:05:51.381-07:00Informatica Java Transformation to Leverage the Power of Java Programming<img align="left" alt="Informatica Java Transformation" border="0" height="100" src="http://4.bp.blogspot.com/-eh31xX5SBD4/UjX2d6sns6I/AAAAAAAAIoE/1R44Nqu3Xqk/s1600/java+logo.png" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px; text-align: justify;" title="" width="100" />
<br />
<div style="text-align: justify;">
Java is, one of the most popular programming languages in use, particularly for client-server web applications. With the introduction of PowerCenter Java <a href="http://www.disoln.org/search/label/Transformations?max-results=15" target="_blank">Transformation</a>, ETL developers can get their feet wet with Java programming and leverage the power of Java. In this article lets learn more about Java <a href="http://www.disoln.org/search/label/Transformations?max-results=15" target="_blank">Transformation</a>, its components and its usage with the help of a use case.</div>
<div style="text-align: justify;">
<a name='more'></a></div>
<h2>
What is Java Transformation</h2>
<div style="text-align: justify;">
With Java transformation you can define transformation logic using java programming language without advanced knowledge of the <a href="http://en.wikipedia.org/wiki/Java_(programming_language)" rel="nofollow" target="_blank">Java programming language</a> or an external Java development environment.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The PowerCenter Client uses the Java Development Kit (<a href="http://en.wikipedia.org/wiki/Java_Development_Kit">JDK</a>) to compile the Java code and generate <a href="http://en.wikipedia.org/wiki/Java_bytecode">byte code</a> for the transformation. The PowerCenter Client stores the byte code in the PowerCenter repository. When the Integration Service runs a session with a Java transformation, the Integration Service uses the Java Runtime Environment (<a href="http://en.wikipedia.org/wiki/Java_Runtime_Environment#Execution_environment">JRE</a>) to execute the byte code and process input rows and generate output rows.</div>
<h2>
Developing Code in Java Transformation</h2>
<div style="text-align: justify;">
You can use the code entry tabs to enter Java code snippets to define Java transformation functionality. Using the code entry tabs with in the transformation, you can import Java packages, write helper code, define Java expressions, and write Java code that defines transformation behavior for specific transformation events. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Below image shows different code entry tabs under 'Java Code'.</div>
<img alt="Informatica Java Transformation" border="0" height="372" src="http://lh5.ggpht.com/-KgMeGi9scCk/UjfeqRB0ZhI/AAAAAAAAIp8/o5T2sjqaP-U/image%25255B18%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto -15px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="551" />
<br />
<ul>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Import Packages</b></span> :- Import third-party Java packages, built-in Java packages, or custom Java packages. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Helper Code</b></span> :- Define variables and methods available to all tabs except Import Packages. After you declare variables and methods on the Helper Code tab, you can use the variables and methods on any code entry tab except the Import Packages tab.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>On Input Row</b></span> :- Define transformation behavior when it receives an input row. The Java code in this tab executes one time for each input row</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>On End of Data </b></span>:- Use this tab to define transformation logic when it has processed all input data. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>On Receiving Transaction</b></span> :- Define transformation behavior when it receives a transaction notification. You can use this only with active Java transformations. </li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Java Expressions</b></span> : - Define Java expressions to call PowerCenter expressions. You can use this in multiple code entry tabs.</li>
</ul>
<div style="text-align: justify;">
<h2>
Java Transformation Use Case </h2>
<div>
Lets take a simple example for our demonstration. The employee data source contains the <i>employee ID, name, Age, Employee description, and the manager ID</i>. We need to create an ETL transformation to find the manager name for a given employee based on the manager ID and generates output file that contain<i> employee ID, name, Employee description, and the Manager name</i>.<br />
<br />
Below shown is the complete structure of the mapping to build the functionality we described above. We are using only Java Transformation other than source, target and source qualifier.</div>
<a href="http://lh6.ggpht.com/-ZdbImY1mHQg/UjfQNTq5VqI/AAAAAAAAIpk/mpR22V9OFE0/s1600-h/image%25255B65%25255D.png"><img alt="Informatica Java Transformation" border="0" height="169" src="http://lh4.ggpht.com/-NBRLgbt9c04/UjfQODIAYfI/AAAAAAAAIps/nJqzw1TYrNI/image_thumb%25255B51%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a>
<br />
<b>Step 1 :- </b>Once you have the source and source qualifier pulled in to the Java Transformation and create input and output ports as shown in below image. Just like any other transformation, you can drag and drop ports from other transformations to create new ports.</div>
<div>
</div>
<a href="http://lh6.ggpht.com/-hiyPtvyPG0A/UjfQDQjlqjI/AAAAAAAAIoU/66HsbF-D8sU/s1600-h/image%25255B72%25255D.png"><img alt="Informatica Java Transformation" border="0" height="453" src="http://lh5.ggpht.com/-E70kuT232G4/UjfQFb8w47I/AAAAAAAAIoc/t4fHbqADiwY/image_thumb%25255B58%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="500" /></a>
<br />
<div style="text-align: justify;">
<b>Step 2 :-</b> Now move to the '<span style="color: #cc0000;"><i>Java Code</i></span>' tab and from '<i><span style="color: #cc0000;">import package</span></i>' tab import the external java classes required by the java code. This tab can be used to import any third party java classes or build in java classes.</div>
<img alt="Informatica Java Transformation" border="0" height="378" src="http://lh5.ggpht.com/-0lr0vnmo7-M/UjfesFXBXwI/AAAAAAAAIqE/5BWBgMnvnAM/image%25255B17%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="551" /><br />
As shown in above image here is the import code used.<br />
<blockquote class="tr_bq">
<span style="color: #0b5394;">import java.util.Map;<br />import java.util.HashMap;</span></blockquote>
<b style="text-align: justify;">Step 3 :- </b><span style="text-align: justify;">In the '<i><span style="color: #cc0000;">Helper Code</span></i>' tab, define the variables, objects and functions required by the java code, which will be written in 'On Input Row'. Here we have created four objects.</span><br />
<a href="http://lh4.ggpht.com/-L96gLCN3-sc/UjfQKXVbh4I/AAAAAAAAIpA/3Y4nXYLaZMc/s1600-h/image%25255B67%25255D.png"><img alt="Informatica Java Transformation" border="0" height="352" src="http://lh4.ggpht.com/-DKIXDyunXmE/UjfQKyy13OI/AAAAAAAAIpM/HdFXsiokAOY/image_thumb%25255B53%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a>
<br />
Below is the code used.<br />
<blockquote class="tr_bq">
<span style="color: #0b5394;">private static Map <Integer, String> empMap = new HashMap <Integer, String> ();<br /> private static Object lock = new Object();<br /> private boolean generateRow;<br />private boolean isRoot;</span></blockquote>
<div style="text-align: justify;">
<b>Step 4 :-</b> In the '<span style="color: #cc0000;"><i>On Input Row</i></span>' tab, define the ETL logic, which will be executed for every input record.</div>
<a href="http://lh5.ggpht.com/-l2C2wEf-uQw/UjfQLyn6JRI/AAAAAAAAIpQ/kGfK2ujKD2M/s1600-h/image%25255B66%25255D.png"><img alt="Informatica Java Transformation" border="0" height="409" src="http://lh6.ggpht.com/-mG_OTscpCHY/UjfQMs7UUaI/AAAAAAAAIpc/PAXQxZKY5Kw/image_thumb%25255B52%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a>
<br />
<div class="p1">
Below is the complete code we need to place it in the '<i><span style="color: #cc0000;">On Input Row</span></i>'<br />
<blockquote class="tr_bq">
<span style="color: #0b5394;">generateRow = true;<br />isRoot = false;<br />if (isNull ("EMP_ID_INP") || isNull ("EMP_NAME_INP"))<br />{<br /> incrementErrorCount(1);<br /> generateRow = false;<br />} else {<br /> EMP_ID_OUT = EMP_ID_INP;<br /> EMP_NAME_OUT = EMP_NAME_INP;<br />}<br />if (isNull ("EMP_DESC_INP"))<br />{<br /> setNull("EMP_DESC_OUT");<br />} else {<br /> EMP_DESC_OUT = EMP_DESC_INP;<br />}<br />boolean isParentEmpIdNull = isNull("EMP_PARENT_EMPID");<br />if(isParentEmpIdNull)<br />{<br /> isRoot = true;<br /> logInfo("This is the root for this hierarchy.");<br /> setNull("EMP_PARENT_EMPNAME");<br />}<br />synchronized(lock)<br />{<br /> if(!isParentEmpIdNull)<br /> EMP_PARENT_EMPNAME = (String) (empMap.get(new Integer (EMP_PARENT_EMPID)));<br /> empMap.put (new Integer(EMP_ID_INP), EMP_NAME_INP);<br />}<br />if(generateRow)<br /> generateRow();</span></blockquote>
<div style="text-align: justify;">
With this we are done with the coding required in Java Transformation and only left with code compilation. <span style="text-align: justify;">Remaining tabs in this java transformation do not need any code for our use case.</span></div>
<h2>
Compile the Java Code</h2>
<div>
<div class="p1" style="text-align: justify;">
To compile the full code for the Java transformation, click Compile on the Java Code tab. The Output window displays the status of the compilation. If the Java code does not compile successfully, correct the errors in the code entry tabs and recompile the Java code. After you successfully compile the transformation, save the transformation to the repository.</div>
</div>
<h2>
Completed Mapping </h2>
<div style="text-align: justify;">
Remaining tabs do not need any code for our use case and all the ports from the java transformation can be connected from the source qualifier and to the target. Below shown is the completed structure of the mapping.</div>
</div>
<a href="http://lh6.ggpht.com/-ZdbImY1mHQg/UjfQNTq5VqI/AAAAAAAAIpk/mpR22V9OFE0/s1600-h/image%25255B65%25255D.png"><img alt="Informatica Java Transformation" border="0" height="169" src="http://lh4.ggpht.com/-NBRLgbt9c04/UjfQODIAYfI/AAAAAAAAIps/nJqzw1TYrNI/image_thumb%25255B51%25255D.png?imgmax=800" style="background-image: none; border: 0px; display: block; float: none; margin: 10px auto 5px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="" width="640" /></a>
<br />
<div style="text-align: justify;">
Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out this java code and java transformation or share us if you use any different use cases you want to implement using java transformation.</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.comtag:blogger.com,1999:blog-6593582717363578994.post-75848541471797211412013-09-08T22:26:00.002-07:002013-12-05T13:35:41.937-08:00Informatica Performance Tuning Guide, Identify Performance Bottlenecks - Part 2<img align="left" alt="Informatica Performance Tuning Guide, Identify Performance Bottlenecks - Part 2" border="0" height="100" src="http://1.bp.blogspot.com/-ylWH3ezt_r4/UilMMgOaEII/AAAAAAAAIlk/Jg3JCHFflQ4/s1600/activity.jpeg" style="border-width: 0px; display: inline; margin: 0px 10px 0px 0px; text-align: justify;" title="" width="100" /><br />
<div style="text-align: justify;">
In our previous article in the <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html" target="_blank">performance tuning</a> series, we covered the basics of Informatica performance tuning process and the session anatomy. In this article we will cover the methods to identify different performance bottlenecks. Here we will use session thread statistics, session performance counter and workflow monitor properties to help us understand the bottlenecks.<br />
<a name='more'></a></div>
<a href="http://www.blogger.com/blogger.g?blogID=6593582717363578994" name="mapping"></a>
<h2>
Source, Target & Mapping Bottlenecks Using Thread Statistics</h2>
<div class="sticky taped" style="background-position: initial initial; background-repeat: initial initial; float: right;">
<b>Performance Tuning Tutorial Series</b><br />
Part I : <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html" target="_blank">Performance Tuning Introduction.</a> <br />
Part II : <a href="http://www.disoln.org/2013/09/Informatica-Performance-Tuning-Guide-Identify-Performance-Bottlenecks.html">Identify Performance Bottlenecks.</a><br />
Part III : <a href="http://www.disoln.org/2013/10/Informatica-Performance-Tuning-Guide-Resolve-Performance-Bottlenecks-Part-3.html">Remove Performance Bottlenecks</a>.<br />
Part IV : <a href="http://www.disoln.org/2013/11/Informatica-Performance-Tuning-Guide-Performance-Enhancement-Features-Part-4.html">Performance Enhancements</a>.</div>
<div style="text-align: justify;">
Thread statics gives run time information from all the three threads; reader, transformation and writer thread. The session log provides enough run time thread statistics to help us understand and pinpoint the performance bottleneck.</div>
<h3>
Gathering Thread Statistics</h3>
<div style="text-align: justify;">
You can get thread statistics from the session long file. When you run a session, the session log file lists run time information and thread statistics with below details.</div>
<div>
<ul>
<li><b><span style="color: #cc0000;">Run Time </span></b>: Amount of time the thread runs. </li>
<li><span style="color: #cc0000;"><b>Idle Time</b></span> : Amount of time the thread is idle. Includes the time the thread waits for other thread processing.</li>
<li><span style="color: #cc0000;"><b>Busy Time</b></span> : Percentage of the run time. It is (run time - idle time) / run time x 100. </li>
<li><span style="color: #cc0000;"><b>Thread Work Time</b></span> : The percentage of time taken to process each transformation in a thread.</li>
</ul>
<div>
<b><span style="color: #cc0000;">Note</span></b> : Session Log file with normal tracing level is required to get the thread statistics.</div>
<div>
<h3>
Understanding Thread Statistics</h3>
<div style="text-align: justify;">
When you run a session, the session log lists run information and thread statistics similar to the following text. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If you read it closely, you will see reader, transformation and writer thread and how much time is spent on each thread and how busy each thread is. Additional to that, transformation thread shows how much busy each transformation in the mapping is.</div>
<div>
<a href="http://lh5.ggpht.com/-TEnAvKZV56U/UjAGfm4qG3I/AAAAAAAAInc/eUYUmOesc9U/s1600-h/Untitled1%25255B15%25255D.png"><span style="color: #3366cc;"></span><img alt="Informatica Performance Tuning Guide, Identify Performance Bottlenecks" border="0" height="338" src="http://lh6.ggpht.com/-M4x5Xt7Kx8w/UjAGgCnx2CI/AAAAAAAAInk/0SaEVJy6yIo/Untitled1_thumb%25255B17%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto 5px;" title="" width="500" /></a></div>
</div>
</div>
<div style="text-align: justify;">
The total run time for the transformation thread is 506 seconds and the busy percentage is 99.7%. This means the transformation thread was never idle for the 506 seconds. The reader and writer busy percentages were significantly smaller, about 9.6% and 24%. In this session, the <i><span style="color: #cc0000;">transformation thread is the bottleneck</span></i> in the mapping.
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To determine which transformation in the transformation thread is the bottleneck, view the busy percentage of each transformation in the thread work time breakdown. The transformation RTR_ZIP_CODE had a busy percentage of 53%.<br />
<br />
<b><span style="color: #cc0000;">Hint</span></b> : Thread with the highest busy percentage is the bottleneck.</div>
<a href="http://www.blogger.com/blogger.g?blogID=6593582717363578994" name="session"></a>
<h2>
Session Bottleneck Using Session Performance Counters</h2>
<div style="text-align: justify;">
All transformations have counters to help measure and improve performance of the transformations. Analyzing these performance details can help you <i><b><span style="color: #cc0000;">identify session bottlenecks</span></b></i>. The Integration Service tracks the number of input rows, output rows, and error rows for each transformation.</div>
<div>
<ul>
</ul>
<h3>
Gathering Performance Counters</h3>
<div style="text-align: justify;">
You can setup the session to gather performance counters in the workflow manager. Below image shows the configuration required for a session to collect transformation performance counters.</div>
<a href="http://lh5.ggpht.com/-A8tlUr2revE/Ui01Dl4B-0I/AAAAAAAAIl0/2ZrvYtXrfYo/s1600-h/image%25255B18%25255D.png"><img alt="Informatica Performance Tuning Guide, Identify Performance Bottlenecks" border="0" height="480" src="http://lh5.ggpht.com/-7kzWEOkMgqQ/Ui01EsrtHEI/AAAAAAAAIl8/VU53S8IHQVU/image_thumb%25255B14%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -15px;" title="" width="590" /></a>
<br />
<h3>
Understanding Performance Counters</h3>
<div style="text-align: justify;">
Below shown image is the performance counters for a session, which you can see from the workflow monitor session run properties.. You can see the transformations in the mapping and the corresponding performance counters.<br />
<br />
A <i><span style="color: #cc0000;">non-zero counts for readfromdisk and writetodisk indicate sub-optimal settings</span></i> for transformation index or data caches. This may indicate the need to tune session transformation caches manually.<br />
<br />
A <span style="color: #cc0000;"><i>non-zero count for </i></span><span style="color: #cc0000;"><i>Errorrows</i><b> </b></span>indicates you should eliminate the transformation errors to improve performance.</div>
<a href="http://lh4.ggpht.com/-2YzD8NvLIOE/UjACiMB7nsI/AAAAAAAAInI/AHrjL_El4E0/s1600-h/image_thumb%25255B11%25255D.png"><img alt="Informatica Performance Tuning Guide, Identify Performance Bottlenecks" border="0" height="433" src="http://lh6.ggpht.com/-y9aKSp3RD5k/UjACiofvi1I/AAAAAAAAInQ/tZ7MFx9V0sY/image_thumb_thumb%25255B9%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -15px;" title="" width="450" /></a><br />
<div style="text-align: justify;">
<ul>
<li><b><span style="color: #cc0000;">Errorrows</span></b> : Transformation errors impact session performance. If a transformation has large numbers of error rows in any of the Transformation_errorrows counters, you should eliminate the errors to improve performance.</li>
<li><b><span style="color: #cc0000;">Readfromdisk and Writetodisk </span></b>: If these counters display any number other than zero, you can increase the cache sizes to improve session performance.</li>
<li><b><span style="color: #cc0000;">Readfromcache and Writetocache</span></b> : Use this counters to analyze how the Integration Service reads from or writes to cache.</li>
<li><span style="color: #cc0000;"><b>Rowsinlookupcache </b></span>: Gives the number of rows in the lookup cache. To improve session performance, tune the lookup expressions for the larger lookup tables.</li>
</ul>
<div>
<a href="http://www.blogger.com/blogger.g?blogID=6593582717363578994" name="sesslog"></a>
<h2 style="text-align: start;">
Session Bottleneck Using Session Log File</h2>
</div>
<div>
<div>
When the Integration Service initializes a session, it allocates blocks of memory to hold source and target data. Not having enough <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html#Anatomy" target="_blank">buffer memory</a> for <a href="http://www.disoln.org/2013/08/Informatica-PowerCenter-Performance-Turning-A-to-Z-Guide.html#Anatomy" target="_blank">DTM process</a>, can slowdown reading, transforming or writing and cause large fluctuations in performance.<br />
<br /></div>
</div>
<div>
If the session is not able to allocate enough memory for the DTP Process, Integration service will write a warning message in to the session log file and gives you the recommended buffer size. Below is a sample message seen in the session</div>
<div>
<br /></div>
<div>
<div class="p1">
<i><span style="color: #cc0000;">Message: WARNING: Insufficient number of data blocks for adequate performance. Increase DTM buffer size of the session. The recommended value is xxxx.</span></i></div>
</div>
</div>
<a href="http://www.blogger.com/blogger.g?blogID=6593582717363578994" name="system"></a>
<h2>
System Bottleneck Using the Workflow Monitor</h2>
<div style="text-align: justify;">
You can view the Integration Service properties in the Workflow Monitor to see CPU, memory, and swap usage of the system when you are running task processes on the Integration Service. Use the following Integration Service properties to identify performance issues: </div>
<a href="http://lh3.ggpht.com/-9QvcUUqtxRU/Ui1MUEg5w_I/AAAAAAAAImc/bky-nItnqf4/s1600-h/image%25255B46%25255D.png"><img alt="Informatica Performance Tuning Guide, Identify Performance Bottlenecks " border="0" height="106" src="http://lh4.ggpht.com/-yptfIh3lnRo/Ui1MVX-dL4I/AAAAAAAAImk/AO89HnW4t-8/image_thumb%25255B40%25255D.png?imgmax=800" style="border: 0px; display: block; float: none; margin: 10px auto -15px;" title="" width="700" /></a>
<br />
<ul>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>CPU% </b></span>: The percentage of CPU usage includes other external tasks running on the system. A high CPU usage indicates the need of additional processing power required by the server.</li>
<li style="text-align: justify;"><span style="color: #cc0000;"><b>Memory Usage</b></span> : The percentage of memory usage includes other external tasks running on the system. If the memory usage is close to 95%, check if the tasks running on the system are using the amount indicated in the Workflow Monitor or if there is a memory leak. To troubleshoot, use system tools to check the memory usage before and after running the session and then compare the results to the memory usage while running the session. </li>
<li style="text-align: justify;"><b><span style="color: #cc0000;">Swap Usage </span></b>: Swap usage is a result of paging due to possible memory leaks or a high number of concurrent tasks.</li>
</ul>
<h2>
<span style="text-align: justify;">What is Next in the Series</span></h2>
<div style="text-align: justify;">
The <a href="http://www.disoln.org/2013/10/Informatica-Performance-Tuning-Guide-Resolve-Performance-Bottlenecks-Part-3.html" target="_blank">next article</a> in this series will cover how to remove bottlenecks and <a href="http://www.disoln.org/2013/10/Informatica-Performance-Tuning-Guide-Resolve-Performance-Bottlenecks-Part-3.html" target="_blank">improve session performance</a>. Hope you enjoyed this article, please leave us a comment or feedback if you have any, we are happy to hear from you.</div>
</div>
<div style="clear: both;">
</div>
<div class="blogger-post-footer"><a href="http://www.disoln.org">Informatica Training & Tutorials</a></div>Johnson Cyriachttp://www.blogger.com/profile/05007089766900105427noreply@blogger.com