Wednesday, August 20, 2014

Informatica Grid options

One of the most frequent query as Informatica Architect I receive is when to go for grid?. Will it not be sufficient if I go for more CPU and RAM on same server instead? High Availability by automatic fail-over is inherent property of grid isn't?

In this blog I would like to throw light on simplified understanding of what is it we are going to achieve and what option of Informatica is required for certain specific needs.

Before we get into to the grid, lets go back to one of the early basics on session/CPU core computation. As per many Informatica articles and actual usage, I have observed that a session dedicatedly required around 1.2 CPU core units to run. i.e. if one has 4 CPU Cores in theory it can support up to 3 sessions in parallel. What happens when more sessions are invoked to run on same CPU configuration? Time based CPU  splicing starts to happen either mandating some sessions to go on wait mode or slow down. As most of the ETL sessions are memory intensive than CPU intensive by the nature of its data movement requirements, mostly we the session/CPU is the requirement from the multithreading architecture of the PowerCenter.

So if one goes by this computation to run more number of sessions in parallel more CPU cores are essential. Now in a large enterprise definitely one may opt for high end multi core CPU monsters. However think of scaling it with cost consideration, one will hit the barrier soon. What if we can add commodity servers on demand and expand with growing demand becomes a smarter choice. Informatica Grid is such option where you start with minimum 2 servers as individual nodes and grow the farm as demand increases. 
One will realize  some benefits as soon as you get onto this configuration:
1. smaller servers 2. elasticity 3. Colocation for data 4. specialized zones 5. almost zero down time during patch upgrades.6. Workflow level distributed computing
Following diagram shows bare minimum architecture of a grid:

Gateway node
Worker node (Backup node)
Shared storage

Although grid provides above benefits, it does not automatically perform failover recovery for sessions. It will just enable node level failover. This is one of the misconception most people have. Grid option just do not provide this High Availability feature. To make sure one has HA on Grid, HA option needs to be procured and also all related components wiz repository and application Databases, storage systems and networks needs to be HA compliant. 

Finally another option which can enable session level computing on the grid is "Session on Grid" option. With this advanced option Informatica distributes individual transformation level tasks on the grid which is useful in CPU process intensive sessions.



Back to blogging

Its been long since I blogged here. I think it right time to get back. Lot has been happening in Data Integration world while away from writing here. Informatica introduced Virtual Data Machine - Vibe, Data integration is getting moved towards the sources with streams, Data Security is taken seriously than ever, vendors are opening up their platforms to cope with exponentially growing type of data sources etc.

Its going to be exciting path ahead. Keep reading...

Thursday, April 12, 2012

Dynamic Expression evaluation in Informatica

Informatica is one of the leading Data Integration - ETL tool in the market for several years now.
One of the objective all the big ETL tool companies strive to accomplish is the need for data transfer speed with complex business rule transformations. Informatica provides various transformations in a typical source DB to target DB mapping. One of the recent advances included making expression evaluation dynamic. i.e expression string itself can now be placed as a parameter in the parameter file.

This feature provides avenues for materializing several ideas for dynamic rule changing. Let us take an example.

In this simple example Name is output port with following expression  : Name = First Name || Last Name

In normal situation, If the expression string needs to be changed so that target Name field to contain only initial character of Last Name, then mapping needs to be modified, tested and moved to production. Now if there was a mapping parameter created like $$Name and isExpression property turned to TRUE, then you could just assign required expression to the parameter in parameter file and use the parameter in the Name port instead of hardcoded concatenation string.

This feature become more practical in situations like bonus calculations, price formulas, scoring etc. where the change of expressions are more frequent.

Kiran Padiyar

Thursday, May 19, 2011

Shopper step analysis

With every square feet of a retail space becoming expensive these days, retailer is continuously on lookout of new ways to optimize floorspace and enhance shopper experience.
How many steps a shopper has to take prior to finding the item?, how many times he goes back n forth? which is the most foot printed place?, are some of the key questions analysts need to answer to yield to optimized placements of items and promotions.
Large retailers already do market basket analysis and placement predictions using data collected in POS. Now using the cart movement on retail floor provides another dimension. Following is a grid showing sample shopper cart movement data.

On the left side are Cart ID and shopper's moving steps (Stops a shopper takes. Every stop is recorded by isle RFID sensors assuming carts are fixed with RFID tags). Columns in the grid show sections in retail shop, these are marked with point numbers. i.e in a ideal case, shopper navigates like point 1 --> point 2 --> .. point 10. Grid cells show sum of sales made(recorded at POS).
Following is a visual showing the heatmap of sales with customer's navigation behaviour.
Blank cells with grey color are sections where shoppers visited and did not make any purchase. Same visualization can be embedded over a shopfloor graphic to show navigation behaviors. 

Hope this post helped in tinkering some new ideas on retail analysis using your BI solution.

Saturday, May 7, 2011

You see exactly what is shown to you! But how information should be shown to you?!

As we navigate through clutter of huge information, lesser and lesser time is at our disposal to understand it. We tend to rely on what is easily seen and rely on that information for decision making. Now you see that quality of what has been shown to you as of utmost importance to you. As most of the organizations rely on technical consultants working on dashboards/report developments following equation may hold good..
Quality and experience of BI consultant == Quality of information shown to you
Quality of information shown to you ==  Quality of information seen by you 
Now every leader sees the same thing, but interprets it differently. Job of a dashboard is to provide avenues to those new explorations with quick insights. BI practitioners should keep user focused on business objectives. Most of the time people who implement these BI visuals don;t understand the deep objective of the work being done.

Lets take a look at a part of BI dashboard snapshot of a travel company represented in two formats. 
First form is classical pie chart. second one is horizontal bars. 

Which one aided you to compare best?!

If you get access to book on Information Dashboards by Stephen Few, do read it, you will benefit a lot.  Information Dashboard Design: The Effective Visual Communication of Data