Monthly Archives: January 2014

Matrix Report in SSRS 2012

 

In our first post we will discuss on Matrix report in SQL Server Reporting Services 2012. Open SQL Server Data Tools and create a Report with name “MatrixDemo” and then use your local system for creating Data Source and connect it with AdventureWorks database.

DataSource
For Data set use below query:
USE AdventureWorks
GO
SELECT Sales.SalesTerritory.Name as Region, Sales.SalesTerritory.CountryRegionCode, Sales.SalesTerritory.[Group] as Territory, Year(Sales.SalesOrderHeader.DueDate) as Year, Month(Sales.SalesOrderHeader.DueDate) as Month, Sales.SalesOrderHeader.TotalDue
FROM Sales.SalesTerritory INNER
JOIN Sales.SalesOrderHeader ON
Sales.SalesTerritory.TerritoryID =Sales.SalesOrderHeader.TerritoryID

 DataSet
Now, drag and drop and Matrix on the report surface, which will look like:
Report
Now, do the below to fill the Matrix:
FillMatrix
Now run this report and it will looks like:

Preview1
Now at this point of time our report do not have any child group, so in our next step lets create child group on our report, For creating Group go to Row Groups and do right click and click on Child Group:

RowGroup
This will open another window and choose Region from Group By and click on OK.
RegionGroup
Now below is the preview of Report:

Preview2

Now play some more with this newly created Group. For this go to Region group and right click and properties and Visibility. Select Hide when report is initially run and select toggling as Territory and click on OK button

. ToggleRowGroup
Again, check the preview of report, which looks like:

Preview3
Now we will create Column Group. For this we need to do the similar work but we have to create a Child Column Group. Right Click on Column Group and Add Group and Then Child Group and select Month and OK. Again do right click on group Month and properties and then Visibility and select initially hide and toggle it with Year. Click on OK and go to Preview of the report.

ColumnGroup

ToggleColumnGroup

Preview4

Now, you are done with Matrix report. If you have any query related to this topic please comment on this post.

 The SAS Windows                                                                                           

There are five basic SAS windows: the Results and Explorer windows, and three programming windows: Editor, Log, and Output. It is possible to bring up SAS without all these windows, and sometimes the windows are not immediately visible (for example, in the Windows operating environment, the Output window comes up behind the Editor and Log windows), but all these windows do exist in your SAS session. There are also many other SAS windows that you may use for tasks such as getting help, changing SAS system options, and customizing your SAS session. The following figure shows the default view for a Microsoft Windows SAS session, with pointers to the five main SAS windows. 

SAS Window

 

Editor This window is a text editor. You can use it to type in, edit, and submit SAS programs as well as edit other text files such as raw data files. In Windows operating environments, the default editor is the Enhanced Editor. The Enhanced Editor is syntax sensitive and color codes your programs  making it easier to read and find mistakes. The Enhanced Editor also allows you to collapse and expand the various steps in your program. For other operating environments, the default editor is the Program Editor whose features vary with the version of SAS and operating environment.

Log The Log window contains notes about your SAS session, and after you submit a SAS program, any notes, errors, or warnings associated with your program as well as the program statements themselves will appear in the Log window.

Output If your program generates any printable results, then they will appear in the Output window.

Results The Results window is like a table of contents for your Output window; the results tree lists each part of your results in an outline form.

Explorer The Explorer window gives you easy access to your SAS files and libraries.

Engineers who implement process control can use analytics to think outside the of box. Better yet, they can use analytics to help solve the issues and risks associated with being inside the box or outside the box in the first place. Read on to learn what box I’m referring to exactly.

Understanding advanced process control

Basic or advanced process control (APC) are terms typically associated with process industries, such as chemicals, petrochemicals, oil and gas, and power generation. These industries deal with many continuous processes and fluid processing. It is interesting to me that even though oil and gas and power generation industries embrace and implement APC that, in general, these same industries don’t necessarily embrace using advanced analytics along with APC.

First let’s see how APC may force an engineer inside a rectangle or box. APC tends to use known values or safe ranges — plus or minus some percentage off these known values, such as an average — as inputs that impact a specific process or target within a process.

Real world processes are best represented by some type of ellipse which can be modeled using a more real world distribution of input values instead of these “known” ranges.  By choosing to use “known safe ranges” as inputs you are now limiting yourself and your process to fitting inside a rectangle or box.

A picture is worth a thousand words, especially in this case the graph below makes it much easier to understand or explain what all this means.

In this simple example, the points represent the real world values, APC represents this process as the black box and only takes into account the values that fall into that box, while advanced analytics represents this process by the red ellipse.  If you use APC alone then you have issues or risks that fall into these two categories:

  1. Points that fall within the box, but outside the ellipse.
  2. Points that fall outside the box, but inside the ellipse.

You may be wondering, who cares? Let’s add some background information to our example. The process is being monitored and there are consequences when the process is stopped or restarted, or if the process fails to stop when it gets too high or low out of range. More specifically:

  • Whenever the process is stopped and restarted it costs our company some number of dollars (lost money!) .
  • If the process gets too high out of range and isn’t stopped, it can cause an explosion (major safety issue!).
  • If the process gets too low out of range the resulting output will not meet the required specifications and will need to be disposed of instead of sold (even more money losses!).

 Now who cares? Everyone at this company should, especially with the safety issue.

Safety issue: The points in the upper right hand corner of the box, but outside the ellipse. In this case the process being monitored appears to be within the proper range, but in the real world the values are too high out of range which eventually results in an explosion.

Money issue: The points above and below the box, but inside the ellipse shows situations where the real world process is actually fine, but our monitoring process has us stop the process and restart it which results in money being lost.

Money issue: The points in the lower left hand corner of the box, but outside the ellipse results in our process being monitored appears to be within the proper range, but the end product will be out of specification and therefore result in money being lost.

Where can this lesson be applied?

I’ve seen advanced analytics used to enhance process controlled systems  for improved safety and overall production in refining oil, generating power, producing beer, chemicals, pharmaceuticals, food products, as well as other processes across a variety of industries. For example, in generating power and monitoring a turbine one can easily identify speed, cooling, and heating as three processes impacted by a variety of measures that can be monitored and improved this way.

What type of processes do you have in your business today that could be improved by applying advanced analytics in this way?

One other issue for the engineer or data scientist to potentially argue over is whether the X and Y variables in this example are correlated or not?  You might think these two would agree, however if they both calculate Pearson’s r value (a common statistical measure to determine correlation), they may come to opposite conclusions.  Once again you may wonder why?  It goes back to being inside the box.  Many times when using APC someone only looks at the points that fall inside the box and as a result they are not using all the data available to make the proper decision.

SAS Interview Questions and Answers

  1. What has been your most common programming mistake?
  2. What is your favorite programming language and why?
  3. What is your favorite operating system? Why?
  4. Do you observe any coding standards? What is your opinion of them?
  5. What percent of your program code is usually original and what percent copied and modified?
  6. Have you ever had to follow SOPs or programming guidelines?
  7. Which is worse: not testing your programs or not commenting your programs?
  8. Name several ways to achieve efficiency in your program. Explain trade-offs.
  9. What other SAS products have you used and consider yourself proficient in using?
  10. How do you make use of functions?
  11. When looking for contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?
  12. What is the significance of the ‘OF’ in X=SUM(OF a1-a4, a6, a9);?
  13. What do the PUT and INPUT functions do?
  14. Which date function advances a date, time or date/time value by a given interval?
  15. What do the MOD and INT function do?
  16. How might you use MOD and INT on numerics to mimic SUBSTR on character strings?
  17. In ARRAY processing, what does the DIM function do?
  18. How would you determine the number of missing or nonmissing values in computations?
  19. What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);?
  20. There is a field containing a date. It needs to be displayed in the format “ddmonyy” if it’s before 1975, “dd mon ccyy” if it’s after 1985, and as ‘Disco Years’ if it’s between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT.
  21. In the following DATA step, what is needed for ‘fraction’ to print to the log? data _null_; x=1/3; if x=.3333 then put ‘fraction’; run;
  22. What is the difference between calculating the ‘mean’ using the mean function and PROC MEANS?
  23. Have you ever used “Proc Merge”? (be prepared for surprising answers..)
  24. If you were given several SAS data sets you were unfamiliar with, how would you find out the variable names and formats of each dataset?
  25. What SAS PROCs have you used and consider yourself proficient in using?
  26. How would you keep SAS from overlaying the a SAS set with its sorted version?
  27. In PROC PRINT, can you print only variables that begin with the letter “A”?
  28. What are some differences between PROC SUMMARY and PROC MEANS?
  29. Code the tables statement for a single-level (most common) frequency.
  30. Code the tables statement to produce a multi-level frequency.
  31. Name the option to produce a frequency line items rather that a table.
  32. Produce output from a frequency. Restrict the printing of the table.
  33. Code a PROC MEANS that shows both summed and averaged output of the data.
  34. Code the option that will allow MEANS to include missing numeric data to be included in the report.
  35. Code the MEANS to produce output to be used later.
  36. Do you use PROC REPORT or PROC TABULATE? Which do you prefer? Explain.
  37. What happens in a one-on-one merge? When would you use one?
  38. How would you combine 3 or more tables with different structures?
  39. What is a problem with merging two data sets that have variables with the same name but different data?
  40. When would you choose to MERGE two data sets together and when would you SET two data sets?
  41. Which data set is the controlling data set in the MERGE statement?
  42. How do the IN= variables improve the capability of a MERGE?
  43. Explain the message ‘MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLES”.
  44. How would you generate 1000 observations from a normal distribution with a mean of 50 and standard deviation of 20. How would you use PROC CHART to look at the distribution? Describe the shape of the distribution.
  45. How do you generate random samples?
  46. What is the purpose of the statement DATA _NULL_ ;?
  47. What is the pound sign used for in the DATA _NULL_?
  48. What would you use the trailing @ sign for?
  49. For what purpose(s) would you use the RETURN statement?
  50. How would you determine how far down on a page you have printed in order to print out footnotes?
  51. What is the purpose of using the N=PS option?
  52. What system options would you use to help debug a macro?
  53. Describe how you would create a macro variable.
  54. How do you identify a macro variable?
  55. How do you define the end of a macro?
  56. How do you assign a macro variable to a SAS variable?
  57. For what purposes have you used SAS macros?
  58. What is the difference between %LOCAL and %GLOBAL?
  59. How long can a macro variable be? A token?
  60. If you use a SYMPUT in a DATA step, when and where can you use the macro variable?
  61. What do you code to create a macro? End one?
  62. Describe how you would pass data to a macro.
  63. You have five data sets that need to be processed identically; how would you simplify that processing with a macro?
  64. How would you code a macro statement to produce information on the SAS log? This statement can be coded anywhere.
  65. How do you add a number to a macro variable?
  66. If you need the value of a variable rather than the variable itself, what would you use to load the value to a macro variable?
  67. Can you execute a macro within a macro? Describe.
  68. Can you a macro within another macro? If so, how would SAS know where the current macro ended and the new one began?
  69. How are parameters passed to a macro?

 

Career in Business Intelligence and Analytics

The economy might still be wobbly. But for Job seekers, there are pockets of promises. Out of the so many options available, Analytics and Business Intelligence is on top of the charts as per one of Editorial of Economic Times.  As per economic times , there are few points that every student and youth should consider before taking decision about their career .

Reasons

Recent developments in hardware and networking technologies have made it cheap to not only gather large volumes of data, but also to store it and retrieve it with ease. The ability to analyze the data and make business sense out of it is a skill that is fast gaining prominence,” says Ajit Isaac, MD and CEO at Ikya Human Capital Solutions, who projects 12,000-15,000 openings next year 2014.

Sectors

Banks, consumer goods, retail, IT & IT consulting, business consulting, & e-commerce/online.

Skills

Training/experience in statistics or financial analysis; familiarity with statistical techniques, and software such as SAS and SPSS.

Pay

Rs 4.5-8 lakh (p.a) Entry level (Graduate/PG)

Rs 8-12 lakh (Five years of experience)

Rs 15 lakh for IITs & Premier Schools