Thursday, June 26, 2008

Looking beyond to ETL from SSIS, next door can be EII - Enterprise Information Integration.

Below paragraph is pre-face of the article, and it's mentioned about inspiration of the article, if you wish to skip it please go to next paragraph. :-).. But I would recommend reading it.

Recently many things cooking around Microsoft Data Services, I hope it wills surely include more services which can be much helpful to implement difficult EII scenarios. New term Database as a Service (DaaS, name cloning from SaaS). Frankly speaking these days war between terminologies is far higher then technologies, aaahh forget about it, it's very confusing which do what!!!! nowadays, terminologies are becoming face of product and patterns are became abstract :-). Anyways let's come to core part of article, actually I was working on very nice requirement where system need to provide a merged data view from multiple system (heterogeneous sources and also multi-cultured also :(..). Previously we used to do complete data integration (As EAI Implementation) with moderate latency across whole system into one web application which provides grid view and reports; it was very frequently used by users and very important too. But this integration consumes 200GB of database for just viewing and reporting purpose (we keep only 15 days data and for that we have to invest minimum of 250 GB data storage on SAN and other resources for implementation of EAI like BizTalk server, message box database, traffic, bandwidth etc...). It was clearly an requirement of EII implementation, but previously it hadn't been identified as EII requirement and individual implementation, too. But after massive discussion (fights) we were implement that system as EII with use of SSIS (SQL Server Integration Services 2005) only. From that time I had the inspiration of this article. Thanks for reading it.

Back to Business:

Microsoft.SQLServer.Dts.DtsClient name space, I am SSIS developer too, but was not much familiar about its functionalities earlier. As a problem I described in above paragraph we design few complex SSIS packages with heterogeneous data sources and data views, merge joins, lots of shorting, querying, cleansing, etc. Here for demonstration I am unable to saw original design but will follow one sample design I have developed specially for article.

In below design, I am trying to querying to two different data sources which have entirely different databases and connected through one common key and I am trying to generate single dataset for viewing and reporting purposes. You can see below data table snapshots where first figure shows first names of people and second table shows last name of people. Both tables are in different database. Here data sources could be any kind of supportive sources inside SSIS for the concept.


Fig 1. Database Tree (two different databases - test_1, test_2).



Fig 2. Table 1 (Db1- table contain first name information).



Fig 3. Table 2 (Db2 - table contain last name information).



Now have a look to sample SSIS package, just I am doing merge operation and storing into data reader as a destination.


Fig 4. Sample SSIS package.



I have also developed one console .Net application for testing and demonstration purpose to call SSIS package as single point data connection which can provide me processed and integrated data within one single record set. Below is sample snapshot of coding, in real-time solution we used WCF services to provide external domain and other application as an exposer of the EII implementation. But here I am trying to demonstrate through C#.Net Console Application.


Fig 5. Console Application Code (C# Console Application).



Fig 6. Receiving data into one data reader.


Fig 7. Output.



You can see above snapshot of output that I received integrated query's result in single set.

Isn't it good? There are many benefit of such implementation I can say. The major benefit is shrink EAI landscape which will decrease support team size (Cost... Managers are very happy to listen, additional benefit) and we can free resources to better performance of it. Another important benefit is, we can avoid non-require expose of database and our precious data to other system which only need for viewing purpose, we can archive it by providing and implementing parametrise/on demand EII Data Services (SaaD :-D). I don't want to go inside other benefits we can archive from such implementation but defiantly it could turn into beneficial implementation for every organization which has such requirement.

Please share your views and comments as well as if you have any suggesting please free to suggest me on nilayparikh@gmail.com

Thanks for visiting my blog.


No comments: