An ETL Parameter Framework to Deal with all sorts of Parametrization Needs

Johnson Cyriac Oct 5, 2014
|

Informatica Cloud Mapping Tutorial for Beginners
We spoke about different etl frameworks in our prior articles. Here in this article lets talk about an ETL framework to deal with parameters we normally use in different ETL jobs and different use cases. Using parametrization in the ETL code increases code reusability, code maintainability and is critical to the quality of the code and reduces the development cycle time.

Framework Components

Our ETL parameter framework will include primarily two components.
    1. A Relational Table :- To store the parameter details and parameter values.
    2. Reusable Mapplet :- Mapplet to log the parameter details and values into the relational table.

1. Relational Table

A relation table will be used to store the parameter details with the below structure. This will store the parameter name, value and the other information relevant to identify the context of the parameter, like folder name, workflow name and session name. 
    • ETL_PARM_ID : A unique sequence number.
    • FOLDER_NAME : Folder name, in which the parameter is used.
    • WRKFLW_NAME : Workflow name, in which the parameter is used.
    • SESSN_NAME : Session name, in which the parameter is used.
    • PARM_NAME : Name of the parameter
    • PARM_VAL : Value of the parameter.
    • ETL_CRT_DATE : Record create timestamp.
    • ETL_UPD_DATE : Record update timestamp.
Note : You can add repository name to the the table, if the framework is planned to use for workflows running in multiple repositories.
Note : All parameter should be stored into the parameter table with its initial value to start with.

2. Reusable Mapplet

mapplet to capture and load the parameter values into the database table. This mapplet takes two input values and gives all the data elements required in the parameter table mentioned above.

Mapplet Input : Parameter name, parameter value.
Mapplet Output : All the data elements required to be stored in the parameter table mentioned above. This output can be connected to the target table to store the information into the relational table.

Framework Implementation in a Workflow

This framework can be implemented for both dynamically changing parameters as well as rarely changing or static parameters.

Dynamically Changing Parameters

Typical example of dynamically changing parameter is "ETL Run Timestamp" which is used for incremental data extraction logic. Lets see how incremental data extraction is implemented using this parameter framework.

Create a mapping variable with MAX aggregation. This variable will hold the parameter value.
An ETL Framework for Parameterization
Note : Reset the mapping variable in the workflow using the pre-session variable assignment.

Set the mapping variable using the SETVARIABLE function in an expression as shown in below image. This will update the mapping variable to the greatest ETL_UPD_DATE value, which will finally be stored into the parameter table using the mapplet.

An ETL Framework for Parameterization
Adjust the source filter to pull incremental data. Incremental data is pulled from the source based on ETL_UPD_DATE as shown in below image.
An ETL Framework for Parameterization
Above mapping configuration will make sure the correct parameter is used and will set the correct parameter value, which is to be stored into the parameter table.

Add an additional mapping pipeline as shown in below image to store the parameter value into the parameter table. This pipeline will update the current value in the parameter table to the latest value. The mapplet used will make sure the correct parameter and parameter value is updated in the parameter table.
image Note : Set the target load order of the new pipeline to the last one in the mapping. Source qualifier of this pipeline will generate one record using "select 'x' from dual" SQL.

Below shown is the complete mapping design.
An ETL Framework for Parameterization

Static or Rarely Changing Parameters

Parameters, which might need occasional changes or static parameter can be stored in the parameter table and can be retrieved in the Informatica mapping using a LookUp transformation. Any changes require for the parameter value should be one time updated done outside of the ETL process.

Below shown is the lookup transformation, which can be used to retrieve parameter value. You just need to pass in the input parameters to the lookup and get the parameter value from the parameter table.
An ETL Framework for Parameterization
Note : The static parameter value should already be saved into the parameter table with its static value, before it can be used in a mapping.

How Parameter Data is Stored in the Parameter table

As discussed, the parameter framework support both static and dynamic parameters. Lets consider a sample data for the explanation.

ETL_PARM_IDFOLDER_NAMEWRKFLW_NAMESESSN_NAMEPARM_NAMEPARM_VAL
1ALLALLALLYR_BEGIN01-JAN-2014
2DW_SALESALLALLREGION_NAMEUSA
3DW_SALESwf_LOAD_CUST_DIMs_LOAD_CUST_DIMLST_RUN_TS10-OCT-2014

Parameter IDs 1 and 2 are static parameters. First parameter is defined to used across all folders, workflow and sessions. Second parameter is still a static one, but specific to all workflows and sessions in the folder DW_SALES. Third parameter is dynamic parameter specific to the session s_LOAD_CUST_DIM, which is running in DW_SALES folder.

Better than Informatica Parameters and Variables

Since the parameter framework stores the values outside Informatica environment, you get much more flexibility with it.
  • Prevents any accidentally parameter value changes, which might happen for mapping variables during code migration.
  • Centralized storage for all parameter values rather than the storing it in different parameter files or mapping variables.
  • Easy to update or change the parameter value, unlike it is with mapping variables. When using it with incremental data extraction logic, it is to update the parameter value to reprocess same data set and enable restartability.
  • Dynamic changing parameters can be handled in the framework. Mapping variables can have only MAX or MIN operations to handle dynamically changing parameters.
  • Parameter framework can handle both static and dynamic parameters.
  • More secure than storing the parameters in a parameter file.
Please leave us a comment below, if you have any other thoughts or scenarios to be covered. We will be more than happy to help you.




About US Contact US Advertise Guest Post Terms and Conditions Privacy Policy Disclaimer

© 2012-2013 Data Intelligence Solution, All Rights Reserved
The contents in this site is copyrighted to Data intelligence Solution and may not be reproduced on other websites.