Monthly Archives: March 2014

Precedence Constraints (Day 2 of 30)

Precedence constraints add tasks to execute them in a given order. Infect precedence constraints not only link tasks together but also define the workflow of SSIS package. A constraint controls the execution of the two tasks linked to gather in such a way that execution of task 2 depends on the status of task1 and also based on the expression (business rules) used to run it. With precedence constraint and expressions, we can design the workflow of our SSIS package based on all business rules and given conditions.

To set up a precedence constraint between two tasks, we must set the constraint value. There are two ways to define the workflow:

Workflow defined by Success, Completion and Failure: Three constraint values are possible:
Success: Task will execute only, when the prior task completed successfully. This is the default option and is in green color.
Completion: Task will execute only, when the prior task completes (whether or not prior task succeeds or fails). This precedence constraint is in blue color.
Failure: Task will execute only, when the prior task fails. This precedence constraint is in red color.

Workflow defined by Expressions or by Logical AND or Logical OR: The workflow logic is just not limited to success, failure and completion. In some of the cases this is also defined by some expressions also. In this mixed case, the outcome (success, failure and completion) also depend on the any valid expression is to be either true or false. Below is the sample of Precedence Constraint Editor:

 Precedence Constraint

To set the expression double click on precedence constraint and in select anyone of them from evaluate expression:
Expression
Expression and Constraint
Expression or Constraint

Expression: The precedence constraint is evaluated based on the expression defined in the Expression text box. If the expression evaluates to true, the workflow continues and the constrained executable runs. If the expression evaluates to false, the constrained executable does not run.
Expression and Constraint: The precedence constraint is evaluated based on both the Value property and the expression. Both must evaluate to true for the constrained executable to run.
Expression or Constraint: The precedence constraint is evaluated based on either the Value property or the expression. At least one of these properties must evaluate to true for the constrained executable to run.

Logical AND or Logical OR: This will only come in picture when one task is directed to at least 2 tasks.
Logical AND: All precedence constraints that point to the constrained executable must evaluate to true in order for that executable to run. This is the default option. If it is selected, the arrow is solid.
Logical OR: Only one precedence constraint that points to the constrained executable must evaluate to true in order for that executable to run. If this option is selected, the arrow is dotted.

Control the flow (Day 1 of 30)

Any package in SSIS starts with Control Flow. Control flow gives the structure to SSIS package. We can divide Control Flow in three type of elements:
1. Containers- Provide Structure
2. Tasks- Provide Functionality
3. Precedence Constraints- Connect the tasks and containers in sequential manner.
Below are the diagrams usage of Control Flow:

Structure -1

Structure - 2

 

 

 

 

 

 
Fig1                                            Fig2

Structure

 

 

 

 

 

 

 
Fig 3

Fig 1 has only one task this is the simplest example of Control Flow.
Fig 2 has a combination of task, container and precedence constraint.
Fig 3 has an extension of Fig 2.
Integration Services supports the nesting of containers, and a control flow can include multiple levels of nested containers. For example, a package could contain a container, which in turn could contain another container and so on.

Control Flow Implementation
You can use Control Flow in your package by using Control Flow tab in designer. The following diagram the default screen of integration Package.
Day-1

 

1. This is SSIS tool box. In this tool box all are Tasks expect the containers: You can use task either to run Parallel or Sequential manner. Adding task is the part of creating workflow. When you add any task in control flow designer, a connector will automatically add with this task. Two tasks or one task and container joins by this connector and we call this connector as Precedence Constraint. Precedence Constraint specify the order in which task will run-Task1 will run on success of Task2 or container and so on.
2. This is the area where all connections will go: Many tasks require a connection to a data          source and when you add a connection it will show in connection manager area.
3. The default properties of Package.
4. You can Zoom in or Zoom out the Control Flow (You need not to press Control+ Scroll                up and Scroll down to do the same.)
5. This is very important – Variables. Variable help us a lot in change the property of task           dynamically.

This is all about the overview of control flow. In our next article we will learn in details about Precedence Constraint.

A Month with SSIS-Introduction

SQL Server Integration Service (SSIS) was first released with SQL Server 2005. I am working with SSIS for the last 7 years and wish to share my learning and experience with you. During this tenure, I’ve learned few tips and tricks which can be really helpful at times.
This blog is not just to give you an idea of how to work with SSIS but also to give you the solutions of the most common problems that we face while working with the tool.
Below are the topics that we will cover during this series:

  1. Control the flow (Day 1 of 30)
  2. Precedence Constraints (Day 2 of 30)
  3. Use Container for tasks (Day 3 of 30)
  4. FTP Task (Day 4 of 30)
  5. Script Task (Day 5 of 30)
  6. File System Task (Day 6 of 30)
  7. CDC Control Task (Day 7 of 30)
  8. Executing SQL (Day 8 of 30)
  9. How to use Breakpoints  (Day 9 of 30)
  10. Work with Variables and Expressions (Day 10 of 30)
  11. Log your SSIS (Day 11 of 30)
  12. Check Point Usage (Day 12 of 30)
  13. Merge Join (Day 13 of 30)
  14. Pivot Transformation (Day 14 of 30)
  15. UnPivot Transformation (Day 15 of 30)
  16. Lookup your records (Day 16 of 30)
  17. Cache data (Day 17 of 30)
  18. Conditional Split (Day 18 of 30)
  19. Row Count (Day 19 of 30)
  20. Import and Export Colum (Day 20 of 30)n
  21. Script Component is our friend (Day 21 of 30)
  22. Import Column (Day 22 of 30)
  23. Work with Parameters (Day 23 of 30)
  24. Error Handling  (Day 24 of 30)
  25. Handle the event  (Day 25 of 30)
  26. Optimization Basics (Day 26 of 30)
  27. Configuration in SSIS (Day 27 of 30)
  28. Deploy SSIS (Day 28 of 30)
  29. Run SSIS through SQL Server Agent Job (Day 29 of 30)
  30. Wrap up (Day 30 of 30)

SQL Server Undocumented Stored Procedures

In all versions of SQL Server have some stored functions which are not documented in MSDN. Microsoft do not document them because they want to use them for their internal use only. Also, Microsoft strongly recommend that not to use these stored procedure in Production environment reason being any time they can make change in the definition of such stored procedures and can also remove them in future.
But still in some of cases we find that these undocumented stored procedures help us a lot. in this post we will learn only 2 of undocumented stored procedures:
sp_MSforeachtable and sp_MSforeachdb.

sp_MSforeachdb: This will loop through each database in SQL Server instance. Instead of use cursor, this helps us a lot when we want to execute something for each database for SQL Server instance.
MSforeachdb
Below are some example of use of this:
1. Print all the database names in a SQL Server Instance
EXEC sp_MSforeachdb ‘USE ?; PRINT DB_NAME()’
2. Print all the tables in all the databases of a SQL Server Instance
EXEC sp_MSforeachdb ‘USE ? SELECT DB_NAME() + ”.” + OBJECT_NAME(object_Id) FROM sys.tables’
3. Display the size of all databases in a SQL Server instance
EXEC sp_MSforeachdb ‘USE ?; EXEC sp_spaceused’

You can also use this stored procedure in taking back up all databases in a single go.

sp_MSforeachtable: Similar to above this will loop through all tables in database.
MSforeachtable
Some common uses are as below:
1. Display the size of all tables in a database:
USE AdventureWorks2012
EXEC sp_MSforeachtable @command1=”EXEC sp_spaceused ‘?’”
2. Display number of rows in all tables in a database:
USE AdventureWorks2012
EXEC sp_MSforeachtable ‘SELECT ”?” AS TableName, Count(*) as NumberOfRows FROM ?’
Again, you can also use of this stored procedure for other tasks like to rebuild indexes for each table in database.

To Centralize Analytics or Not, That is the Question

The structure of analytics in large organizations can take many forms—from having a gazillion analytics micro-teams embedded in each function or BU, to completely centralized analytics at the corporate level. What is the right strategy? What should your organization do?

Well, in that respect, the title of this post is misleading. To centralize or not to centralize, is actually NOT the question. If you think of centralization on a scale going from ‘not at all’ to ‘fully centralized’, the real question is what is the right level for you?

To answer that question you must be aware of the pros and cons of moving one way or the other on that scale. Having been a part of multiple “re-orgs” and that have gone up and down on the scale, and having influenced some of those movements some of the time, I have some first hand insight into this.

So here are the top 5 key trade-offs when faced with organizational structure of analytics.

1. Consultant Mindset vs. Deep Personal Investment: God bless consultants, they often save the day! But one thing they cannot claim is deep emotional investment in the organization they are working for. This is what high degree of centralization does. Analysts are assigned to BU’s or functions based on prioritization of the project and resource constraints. Their mindset is like that of a consultant, where you work on a project, crunch the numbers, deliver the insights and you job is done… time to move on to the next one. With analytics embedded within the function, there can be full integration of analytics with the project right from its conception. The alignment of purpose this creates, produces very non-linear synergistic effects with respect to the value derived from analytics. This alignment/ownership, of course could be a problem by itself, which brings us to the next point

2. Objectivity (or at least the perception of it): If the analytics team reports into the owner of the domain, and their rewards are aligned with the success of the projects being analyzed, the objectivity of the analysis could be in question. The analyst could potentially introduce a bias to make the project/initiative look better than it actually is. With analytics, credibility is everything. The perception of lack of objectivity could be devastating for the entire group/organization. If you believe that numbers cannot lie, you are either not in the field of analytics or are deluded. Read How To Lie With Statistics for starters.

3. Bureaucracy vs. Efficiency: Centralization brings bureaucracy; sometimes copious amounts of bureaucracy,  depending on who is the heading analytics. Everything needs to get into the pipeline, and get prioritized, and get resources allocated against it. There are protocols for communication, to ensure the Business Units are not side stepping the process (this seems like paranoia but I have experienced this first hand). It could suck the excitement out of a very creative job (I am talking about analytics of course), and turn analysts into full time project managers (God bless project managers, I have nothing against them either).

4. Redundancy vs. Effectiveness: With the “embedded” model, it is easy for different analytics teams to get redundant in their analyses and continually reinvent the proverbial wheel. Centralization dramatically reduces redundancy, thus making the analytics team more effective. There is more knowledge sharing, a better sense of community of like-minded people, and more flexibility in leveraging a wide range of skill sets among analysts. This improves the throughput by improving the utilization of resources, thus also making the team lean.

5. Silos vs. Big Picture: Small teams of analysts embedded within the BU end up working in silos. While they become experts in their own domain, they run the risk of losing the big picture. This can be detrimental not only to the quality and relevance of the insights generated, but also to the career growth prospects and job satisfaction of the members of analytics team.

So that brings us the decision point—what is the right level of centralization. Business Units or functional teams will always resist centralization of analytics because they would not get dedicated capacity anymore. Analysts, on the other hand, would likely (but not always) resist decentralization. The holy grail is to find the level at which both the stakeholders are equally happy (or equally unhappy!), such that analysts get some opportunity to move around, cross-train and gain breadth of domain, and at the same time, have the chance to develop deep domain knowledge in a specific part of the organization and to influence/drive the strategy for the Business Unit as opposed to reporting out data. Finding that sweet spot is not easy, but this hopefully gives you a sense of what you are looking for in the first place.

If you are a marketer, product manager, operations professional, ready to equip yourself with the “Data to Decisions” framework, start by taking our ““Data to Decisions” intro analytics course online today! Once you have taken the level-1 course, you would get access to level-2, “Hands on Analytics” course and Level-3, “Hands-on A/B Testing” course online to complete your “Data to Decisions” skill upgrade.