I have recently been curious to implement parallelism of different flows in Integration Services Packages in my current project.
Where project requirement is to process Data as fast as we can with all parallel, Multithreading or by any other way in very less span of time.
After digging in to a lot of things I realize there are n number of ways and n number of post available
- I can use Multicasting of My Task Components
- I can use Multithreading in my similar Tasks
- I can use Balance Data Load Distributer where ever required
- Data flow pipeline engine can divide a Path (which was called Execution Tree in SSIS 2005) into several Sub Paths – each one can be executed on its own processor/thread. This threading optimization promises to have many more enhancements on performance in other parts of SSIS such as packages with deep parallel execution paths. With increasing performance on most systems, data flow pipeline engine reduces the need for manual configuration of SSIS packages to increase parallelism, and therefore increased developer productivity.The pipeline allocates four threads to work on these four sub paths in parallel and utilizes the hardware resources and improves the performance.
How to use SSIS MaxConcurrentExecutables property ?
Let say I have a master package with 10 execute package tasks that I need to execute in parallel. But only 6 of the 10 packages execute at one time. I won’t go into why all 10 need to execute in parallel.
The server has 4 processors and I had the MaxConcurrentExecutables property of the master package set to -1. (This equates to 4 + 2 concurrent exe’s). I also have each execute package tasks ExecuteOutOfProcess property set to true.
So, I tried to up the number from -1 to 10. Well this worked great when running the master package within BIDS.
Nice example is given on following following link with lots of explanation
- MaxConcurrentExecutables Parameter : Parallel Processing of Packages
- Package.MaxConcurrentExecutables Property (Microsoft.SqlServer.Dts.Runtime)
- Implement Parallel Execution in SSIS :MSDN Blogs
- MS SQL ::Multicasting Concept and Implementation
- SQL Server Integration Services(SSIS): SSIS-Parallel Processing
- SSIS Junkie : Investigation: Can different combinations affect Dataflow performance?
- MSDN Blogs
- Todd McDermid’s Blog: Parallelism in SSIS – Multiple Lookups
Some More Tips and Link
- MSDN Whitepaper: Integration Services: Performance Tuning Techniques
- The “godfather” of SSIS MVPs Jamie Thomson: SSIS Lookup component tuning tips
- From the SSIS Team blog: Getting Optimal Performance with Integration Services Lookups
- In fact, all of the SSIS Team blog posts on Lookups
- Fellow MVP Phil Brammer: SSIS – Lookup Cache Modes – Full, Partial, None
- From the Data Storage & SQL Server Performance team blog: Getting Optimal Performance with Integration Services Lookups
- SQLCAT Videos: Designing and Tuning for Performance your SSIS packages in the Enterprise (SQL Video Series)
I have already written on complete post for the explanation of balance data distributor click here to see that post :
At Last what I have done after all reading and googling lot about implement following strategies
- Use multicasting where Different independent task for parallel execution
- Use For Each loop container with MaxConcurrentExecutables where is common Set of module to run parallel module may contain multiple task
- Balance Data Distributer to speed up data transformations, so it won’t have an effect on single-processor configurations
Hope this Helps !!
Quote of Day:
A joyful heart is the inevitable result of a heart burning with love. ~ Mother Teresa