Category Archives: SAS

Big Data Analytics and Hadoop Opportunities

big data hadoop

With organizations extending their operations manifold and the information revolution growing in volume with every passing minute; there is an increasing need to manage and extract precise and useful knowledge from this vast information wellspring. It is this growing need that has given birth to the concept of ‘Big Data’ analytics.

Understanding Big Data

Big Data refers to the huge chunks of data containing valuable information on a variety of subjects. This information may be fully organized, semi-organized or completely unorganized. However, it makes it no less valuable a source for extracting the most sought after knowledge and inputs that are of utmost importance to businesses, defence services, markets, financial organizations, analysts, economists, governments, or even researchers and scientists.

Significance of Big Data Analytics

Big Data & Smarter

The process of analysing these humongous data sets to unravel the hidden trends, patterns, correlations, indications, customer preferences and much more is called – Big Data Analytics. Organizations depend upon these analytical findings to draw-out improved strategies, opportunities, defence mechanisms, enhanced customer service, as well as, ways to improve their operational efficiency and gain an edge over existing or future competition.

Big Data Analytics is aimed at enabling companies to get the right information, through the right source, at the right time; in order to ensure the right action.

Why Hadoop?

Big Data Ecosystem

Most organizations today are looking for new tools that will allow them to not only collect big data, but also process it for in-depth analysis and derive new knowledge that can be used to their advantage.

One such effective, new, Java-based, technological tool that aids Big Data analytics is Hadoop. This programming framework from Apache™ is an open source software, that enables analysts to process large data sets, spread across distributed commodity servers.

Some reasons that make Hadoop a popular choice are:

  • Hadoop allows the flexibility of easy and quick scalability from a single server usage to multiple machines.
  • Changing the very dynamics and economics involved in mass computing; it emerges as a cost effective, flexible and highly fault tolerant tool.
  • Some of the top companies today are using Hadoop for Big data; creating tremendous opportunities of well paying jobs in Big Data for trained Hadoop professionals.
  • Hadoop architects, Hadoop developers, and Hadoop testers are in great demand in the market today.
  • Industry survey reports claim that the salary of Big Data Hadoop Professionals in India is anywhere above 6 lakh per annum and is expected to grow almost at the rate of 25 percent yearly.

Identifying the growing demand and amazing job prospects for Big Data Hadoop trained professionals; we at Analytic Square have come up with some of the best programs in Big Data Analytic Training in Delhi.

Big Data Analytics Training

Big Data-FutureOur well crafted Online Big Data Hadoop Training programs and modules are designed with care for those looking to carve a niche for themselves in the world of Big Data Analytic with Hadoop.


If managing, interpreting and cracking big data drives you; Analytic Square is the perfect place for you.

With high quality learning content, well trained faculty, to the point doubt clarification, real time projects, and ample training sessions that help you clear the certificate exam; the Analytic Square Institute offers professional level Online Big Data Analytic training to help you step forward with confidence and advance your career in this field. Both in terms of time, as well as, money invested; we offer the most competitive Big Data Hadoop Training in Delhi.

 So go ahead and contact us to join today. We look forward to embark with you on your journey of success!Hadoop Distribution



SAS Project-2

In our ongoing effort, here we have added another SAS project where you will get glimpse of analytics also for your practice purpose.

A study conducted to measure whether a product which has reduced the cost are equally preferred to the current product. A sample of 200 people were asked their preference on the pair of samples Coded 27 and 45. One half of the group tasted sample 27 first and 45 second, the other half tasted the samples in the reverse order. Each people  are also asked to rate six qualities of these sample on a 0 to 6 scale (0=no response,1=excellent,… ,6 poor).  The preferences of each people were recorded as: 1, prefers first sample tasted 2, prefers second sample tasted 3, no preference. Please access the data by click  Assignment2.

The first variable is the people number, the second variable is number of the sample tasted first, the third variable is the number of the sample tasted second, the fourth variable is the six ratings for the first sample tasted, the fifth variable is the six ratings of the second sample tasted, and the last variable is the preference. You need to solve below business problems and give your comments and suggestions.

1. How many people preferred sample 27? You can use if-then statements to prepare the data set and use  n option in proc means. Use proc freq.

2. Use chi-square tests to compare the distribution of the rankings of the two products for each of the six aspects of quality. The chi-square test can run using proc freq. The information on column-wise input and do loops may be useful. Are the two products interchangeable? Interpret the output and express in statement for non technical readers.


SAS Projects- Assignment 1

Dear reader, we are going to add some practice assignment on different tools and techniques. We would encourage you to practice on these topics.


In an  advertisements a company claimed that taking its product Down-Drowsy reduced the time to fall asleep by 46% over the time necessary without pills. Able based this claim on a sleep study. Persons were asked to record how long they took to fall asleep (`sleep latency’) in minutes, and their average for a week was computed. In the next week these persons received Down-Drowsy and recorded their sleep latency. The following link gives data of the average sleep latency for each of the 73 persons first for the week without pills and then for the week with pills.

Click Assignment 1 to access the data.

Problem:1 Put the data above into a SAS data set containing 3 variables and use Patient, Week 1 and Week 2 as labels. Refer to Data Step Basics , SAS Variables, and Input Statement (List) for assistance. (Use the windows clipboard to transfer the data from the help file to the SAS program window following the dataline statement.) Use input statement with @@.

Problem:2 Use proc sort to arrange the data in increasing order by patient number. Print this sorted data set using proc print.

Problem:3 Use proc means to calculate the mean and standard deviation of the sleep latency times for each individual week.

Problem:4 How you will statistically find out that which drug is more effective. Use statistical call and interpret them.

Please submit your answer or let us know at that if you need assistance or help to solve above problems.

To Centralize Analytics or Not, That is the Question

The structure of analytics in large organizations can take many forms—from having a gazillion analytics micro-teams embedded in each function or BU, to completely centralized analytics at the corporate level. What is the right strategy? What should your organization do?

Well, in that respect, the title of this post is misleading. To centralize or not to centralize, is actually NOT the question. If you think of centralization on a scale going from ‘not at all’ to ‘fully centralized’, the real question is what is the right level for you?

To answer that question you must be aware of the pros and cons of moving one way or the other on that scale. Having been a part of multiple “re-orgs” and that have gone up and down on the scale, and having influenced some of those movements some of the time, I have some first hand insight into this.

So here are the top 5 key trade-offs when faced with organizational structure of analytics.

1. Consultant Mindset vs. Deep Personal Investment: God bless consultants, they often save the day! But one thing they cannot claim is deep emotional investment in the organization they are working for. This is what high degree of centralization does. Analysts are assigned to BU’s or functions based on prioritization of the project and resource constraints. Their mindset is like that of a consultant, where you work on a project, crunch the numbers, deliver the insights and you job is done… time to move on to the next one. With analytics embedded within the function, there can be full integration of analytics with the project right from its conception. The alignment of purpose this creates, produces very non-linear synergistic effects with respect to the value derived from analytics. This alignment/ownership, of course could be a problem by itself, which brings us to the next point

2. Objectivity (or at least the perception of it): If the analytics team reports into the owner of the domain, and their rewards are aligned with the success of the projects being analyzed, the objectivity of the analysis could be in question. The analyst could potentially introduce a bias to make the project/initiative look better than it actually is. With analytics, credibility is everything. The perception of lack of objectivity could be devastating for the entire group/organization. If you believe that numbers cannot lie, you are either not in the field of analytics or are deluded. Read How To Lie With Statistics for starters.

3. Bureaucracy vs. Efficiency: Centralization brings bureaucracy; sometimes copious amounts of bureaucracy,  depending on who is the heading analytics. Everything needs to get into the pipeline, and get prioritized, and get resources allocated against it. There are protocols for communication, to ensure the Business Units are not side stepping the process (this seems like paranoia but I have experienced this first hand). It could suck the excitement out of a very creative job (I am talking about analytics of course), and turn analysts into full time project managers (God bless project managers, I have nothing against them either).

4. Redundancy vs. Effectiveness: With the “embedded” model, it is easy for different analytics teams to get redundant in their analyses and continually reinvent the proverbial wheel. Centralization dramatically reduces redundancy, thus making the analytics team more effective. There is more knowledge sharing, a better sense of community of like-minded people, and more flexibility in leveraging a wide range of skill sets among analysts. This improves the throughput by improving the utilization of resources, thus also making the team lean.

5. Silos vs. Big Picture: Small teams of analysts embedded within the BU end up working in silos. While they become experts in their own domain, they run the risk of losing the big picture. This can be detrimental not only to the quality and relevance of the insights generated, but also to the career growth prospects and job satisfaction of the members of analytics team.

So that brings us the decision point—what is the right level of centralization. Business Units or functional teams will always resist centralization of analytics because they would not get dedicated capacity anymore. Analysts, on the other hand, would likely (but not always) resist decentralization. The holy grail is to find the level at which both the stakeholders are equally happy (or equally unhappy!), such that analysts get some opportunity to move around, cross-train and gain breadth of domain, and at the same time, have the chance to develop deep domain knowledge in a specific part of the organization and to influence/drive the strategy for the Business Unit as opposed to reporting out data. Finding that sweet spot is not easy, but this hopefully gives you a sense of what you are looking for in the first place.

If you are a marketer, product manager, operations professional, ready to equip yourself with the “Data to Decisions” framework, start by taking our ““Data to Decisions” intro analytics course online today! Once you have taken the level-1 course, you would get access to level-2, “Hands on Analytics” course and Level-3, “Hands-on A/B Testing” course online to complete your “Data to Decisions” skill upgrade.

3 Steps to Identify the Analytics Training You Need

You know your keynotes at conferences have a positive impact when they raise awareness. My keynotes raise awareness not just to the science that is analytics and its application, but also the need to achieve erudition at it. I know, because one of the most commonly asked post-keynote questions I get is – “I’m very interested in furthering my knowledge in Analytics. Given my background, could you suggest what kind of analytics training should I look for?”

The past few years, have borne witness to a boom in analytics education – be it an Analytics major in a multi-year Master’s Degree, Software tool training, Multi-day workshops or even concise online tutorials.  The multitude of offerings, while all relevant, make the task of selecting the appropriate program very arduous for professionals. Additionally, there is not enough clarity on pertinence, process and practice to answer the one key question – what is truly needed to succeed in analytics? 

If you have been looking to get trained in analytics and have also been wondering how to choose, I recommend following these 3 steps to find out what you need, based your own background and where you want to go.

STEP 1: Identify what you want to do

What current/future role are you going for: are you/do you want to be an analyst/data scientist? Or are you a business professional, looking to leverage analytics in your day to day work flow?

STEP 2: Identify the skills gap you have based on what you want to do

As you can imagine, the skills needed for business professionals within Marketing, Product etc. functions to leverage data effectively is going to be somewhat different from that of a data scientist. Data scientists need deeper technical skills and skills to work effectively with business professionals. The 6 key analytics skills used by successful analyst/data scientist are:

  1. DTD framework: Understanding and hands-on experience of the basic “Data to Decisions” framework
  2. SQL skills: Ability to pull data from multiple sources and collate: experience in writing SQL queries and exposure to tools like Teradata TDC -0.3%, Oracle ORCL NaN% etc. Some understanding of Big Data tools using Hadoop is also helpful.
  3. Basic “applied” stat techniques: Hands-on experience with basic statistical techniques: Profiling, Correlation analysis, Trend analysis, Sizing/Estimation, Segmentation (RFM, product migration etc.)
  4. Working effectively with business side: Ability to work effectively with stakeholders by building alignment, effective communication and influencing
  5. Advanced “applied” stat techniques (hands-on): Hands-on comfort with advance techniques: Time Series, Predictive Analytics – Regression and Decision Tree, Segmentation (K-means clustering) and Text Analytics (optional)
  6. Stat Tools: Experience with one or more statistical tools like SAS, R, SPSS, Knime or others.

On the other hand, business professionals need easy access to data through some kind of tool like Business Object, Micro strategy etc., basic analysis skills and ability to work effectively with data scientists and analysts. The 4 key analytics skills needed by business professionals are:

  1. DTD framework: Understanding and hands-on experience of the basic “Data to Decisions” framework
  2. Basic “applied” stat techniques: Hands-on experience with basic statistical techniques: Profiling, Correlation analysis, Trend analysis, Sizing/Estimation, Basic Segmentation
  3. Working effectively with analysts: Ability to work effectively with Data Scientists/Analyst
  4. Advanced “applied” stat techniques (intro): High level understanding of advance techniques: Time Series, Predictive Analytics – Regression and Decision Tree, Segmentation

STEP 3: Based on skills gap you identified, choose the most appropriate training option

Given what you want to do, figure out the skills gap you have and fill out the chart below. Depending on the gaps, there 3 major options to get the analytics training you need: Analytics Chart

Analytics Training Skills Chart

  1. Master’s degree in Analytics: Several universities are offering Master’s degree in Analytics often by combining courses from their Statistics, Computer Science and Management department. In my experience, this program is most useful for individuals with no quantitative background but looking for future data scientist/analyst roles. These programs are fairly comprehensive but are as a result, time consuming and often not appropriate for working professionals. Some universities do offer online options making it more accessible.
  2. Semester courses at local universities: Most universities offer semester/quarterly courses from statistics and computer science department, often as part of continuing education program. These courses are most appropriate for data scientist/analyst/people with some quantitative background who are looking to pick up incremental skills for their current analytics role – for e.g. if are in an analytics role and you have never used R, you can take a semester course like “programming in R”.
  3. Professional Workshop: Many consulting companies like Analytic Square and others offer short analytics training most appropriate for working professionals. Depending on their area of focus, these short courses are most appropriate for business professionals looking to leverage data to make better decisions and analyst looking to pick incremental skills. The most valuable aspect of these courses are that these are courses geared towards business and often taught by analytics professionals who have seen analytics in action as applied to business. Downside of these courses are, they are not comprehensive and often don’t cover all the statistical concepts. But being short in duration, they are very accessible by most working professionals. Statistical tool companies, like SAS, SPSS etc. are good places to get the respective tool training.

But in the end, do your own due diligence and be sure to match the gaps you have identified with the courses you choose to take.