Data Security Using Informatica PowerCenter Data Masking Transformation

Johnson Cyriac Feb 28, 2014

Informatica Data masking Transactions
You might have come across scenario where in you do not have enough good data in your Development and QA regions for your testing purpose; and you are not allowed to copy over data from production environment due to the data security reasons. Now using Informatica PowerCenter data masking transformation you can overcome such scenarios. In this article, lets see the usage of masking transformation.

What is Data Masking Transformation

Using Data Masking transformation, you change sensitive production data to realistic test data for non-production environments. The Data Masking transformation modifies source data based on masking rules that you configure for each column.

You can apply the following types of masking with the Data Masking transformation.
  • Key masking :- Produces deterministic results for the same source data,. 
  • Random masking :- Produces random, non-repeatable results for the same source data. 
  • Expression masking :- Applies an expression to a port to change the data or create data. 
  • Substitution :- Replaces a column of data with similar but unrelated data from a dictionary. 
  • Special mask formats :- Applies special mask formats to change SSN, credit card number, phone number, URL, email address, or IP addresses.
Lets see each masking rules in detail.

Key Masking

A column configured for key masking returns deterministic masked data each time the source value and seed value are the same. The masked output remains the same with the same input value. Use the same seed value to generate same masked value between transformations for the same input value.

Key Masking Properties

You can configure the following masking rules and properties for key masking string values:
  • Seed :- Apply a seed value to generate same masked data for a column for the input between sessions. Select one of the following options:
    • Value :- Accept the default seed value or enter a number between 1 and 1,000. 
    • Mapping Parameter :- Use a mapping parameter to define the seed value.
  • Mask Format :- Define the type of character to substitute for each character in the input data. Use this property to keep the input and masked data in the same format.
  • Source String Characters :- Source string characters are source characters that you choose to mask or not mask. 
  • Result String Characters :- Substitute the characters in the target string with the characters you define in Result String Characters.
Hint :- Use the same seed value to mask a primary key in a table and the foreign key value in another table.

Example :- Below shown is the masking properties for Key Masking. This transformation masks the DEPT_ID column using key masking. The masked DEPT_ID will have the format for DDD+AAAAAA
Data Security Using Informatica PowerCenter Data Masking Transformation - Key Masking

Substitution Masking

Substitution masking replaces a column of data with similar but unrelated data. When you configure substitution masking, define the relational or flat file dictionary that contains the substitute values. The Data Masking transformation performs a lookup on the dictionary that you configure and replaces source data with data from the dictionary. It is an effective way to replace production data with realistic test data.

Substitution Source Directories

For using substitution masking, you need a flat file or relational table that contains the substitute data and a serial number for each row in the file or the relational table. The serial number should start from one and can not have any missing numbers..

Below is the structure of the substitution file, which got a serial number column, department id and the corresponding masked department id.

SNO,DEPT_ID,MASKED_DEPT_ID,1,DPT-128923,ABC-999999,2,DPT-234265,LMN-888888

Substitution Masking Properties

You can configure the following masking rules for substitution masking.
  • Repeatable Output :- Returns same results between sessions for the same input.
  • Seed :- Apply a seed value to generate same masked data for a column for the input between sessions. Select one of the following options: 
    • Value :- Accept the default seed value or enter a number between 1 and 1,000. 
    • Mapping Parameter :- Use a mapping parameter to define the seed value.
    • Unique Output :- Force the PowerCenter Integration Service to create unique Data Masking output values for unique input values. No two input values are masked to the same output value.
  • Dictionary Information :- Configure the flat file or relational table that contains the substitute data values. 
    • Relational Table :- Select Relational Table if the dictionary is in a database table. 
    • Flat File :- Select Flat File if the dictionary is in flat file delimited by commas. 
    • Dictionary Name :- Displays the flat file or relational table name that you selected. 
    • Serial Number Column :- Select the column in the dictionary that contains the serial number. 
    • Output Column :- Choose the column to return to the Data Masking transformation. 
  • Lookup condition :- When you configure a lookup condition you compare the value of a column in the source with a column in the dictionary to pick the masked value.
    • Input port :- Source data column to use in the lookup. 
    • Dictionary column :- Dictionary column to compare the input port to.
Example :- Below shown is the masking properties for Substitution Masking. As per the example below, SNO is the serial number column and MASKED_DEPT_ID is the substitution value from the file for each DEPT_ID. Lookup condition to search the flat file is defined on DEPT_ID.
Data Security Using Informatica PowerCenter Data Masking Transformation - Substitution Masking

Random Masking

Random masking generates random masked data. The Data Masking transformation returns different values when the same source value occurs in different rows. You can mask numeric, string or date values with random masking.

Random Masking Properties

You can configure the following masking rules for random masking.
  • Range :- Configure the minimum and maximum string length. The Data Masking transformation returns a string of random characters between the minimum and maximum string length.
  • Mask Format :- Define the type of character to substitute for each character in the input data. Use this property to keep the input and masked data in the same format.
  • Source String Characters :- Source string characters are source characters that you choose to mask or not mask. 
  • Result String Characters :- Substitute the characters in the target string with the characters you define in Result String Characters.
Example :- Below shown is the masking properties for Expression Masking. As per the example below, masked DEPT_ID will have the format for DDD+AAAAAA and the character '-' will not be masked.
Data Security Using Informatica PowerCenter Data Masking Transformation - Random Masking

Expression Masking

Expression masking applies an expression to a port to change the data or create new data. When you configure expression masking, create an expression in the Expression Editor. You can select input and output ports, functions, variables, and operators to build expressions.

Example :- Below shown is the masking properties for Expression Masking.
Data Security Using Informatica PowerCenter Data Masking Transformation - Expression Masking

Special Masking Formats

Applies special mask formats to change SSN, credit card number, phone number, URL, email address, or IP addresses. The Data Masking transformation returns a masked value that has a realistic format, but is not a valid value. For example, when you mask an SSN, the Data Masking transformation returns an SSN that is the correct format but is not valid. You can configure repeatable masking for Social Security numbers.

Example :- Below shown is the masking properties for Special Masking.
Data Security Using Informatica PowerCenter Data Masking Transformation - Special formats

Masking Properties in Detail

Lets see few masking properties in detail.

1. Mask Format

Configure a mask format to limit each character in the output column to an alphabetic, numeric, or alphanumeric character. This property is used by random and key masking. Use the following characters to define a mask format: 
  1. A :- Alphabetical characters. For example, ASCII characters a to z and A to Z.
  2. D :- Digits. 0 to 9.
  3. N :-Alphanumeric characters. For example, ASCII characters a to z, A to Z, and 0-9.
  4. X :-Any character. For example, alphanumeric or symbol.
  5. + :- No masking.
  6. R :- Specifies that the remaining characters in the string can be any character type.

2. Source String Characters

Source string characters are source characters that you choose to mask or not mask. The position of the characters in the source string does not matter but it is case sensitive. This property is used by random and key masking.

Mask Only :- The Data Masking transformation masks characters in the source that you configure as source string characters. For example, if you enter the characters A, B, and c, the Data Masking transformation replaces A, B, or c with a different character when the character occurs in source data. A source character that is not an A, B, or c does not change. The mask is case sensitive.

Mask All Except :- Masks all characters except the source string characters that occur in the source string.

3. Result String Replacement Characters

Result string replacement characters are characters you choose as substitute characters in the masked data. When you configure result string replacement characters, the Data Masking transformation replaces characters in the source string with the result string replacement characters. This property is used by random and key masking.

Use Only :- Mask the source with only the characters you define as result string replacement characters. For example, if you enter the characters A, B, and c, the Data Masking transformation replaces every character in the source column with an A, B, or c. The word “horse” might be replaced with “BAcBA.” 

Use All Except :- Mask the source with any characters except the characters you define as result string replacement characters. For example, if you enter A, B, and c result string replacement characters, the masked data never has the characters A, B, or c.

Hope you enjoyed this article. Feel free to ask any further questions or clarification you may have below in the comment section. We are happy to help you with.




About US Contact US Advertise Guest Post Terms and Conditions Privacy Policy Disclaimer

© 2012-2013 Data Intelligence Solution, All Rights Reserved
The contents in this site is copyrighted to Data intelligence Solution and may not be reproduced on other websites.