Monthly Archives: August 2014

SAS Project-2

In our ongoing effort, here we have added another SAS project where you will get glimpse of analytics also for your practice purpose.

A study conducted to measure whether a product which has reduced the cost are equally preferred to the current product. A sample of 200 people were asked their preference on the pair of samples Coded 27 and 45. One half of the group tasted sample 27 first and 45 second, the other half tasted the samples in the reverse order. Each people  are also asked to rate six qualities of these sample on a 0 to 6 scale (0=no response,1=excellent,… ,6 poor).  The preferences of each people were recorded as: 1, prefers first sample tasted 2, prefers second sample tasted 3, no preference. Please access the data by click  Assignment2.

The first variable is the people number, the second variable is number of the sample tasted first, the third variable is the number of the sample tasted second, the fourth variable is the six ratings for the first sample tasted, the fifth variable is the six ratings of the second sample tasted, and the last variable is the preference. You need to solve below business problems and give your comments and suggestions.

1. How many people preferred sample 27? You can use if-then statements to prepare the data set and use  n option in proc means. Use proc freq.

2. Use chi-square tests to compare the distribution of the rankings of the two products for each of the six aspects of quality. The chi-square test can run using proc freq. The information on column-wise input and do loops may be useful. Are the two products interchangeable? Interpret the output and express in statement for non technical readers.


SAS Projects- Assignment 1

Dear reader, we are going to add some practice assignment on different tools and techniques. We would encourage you to practice on these topics.


In an  advertisements a company claimed that taking its product Down-Drowsy reduced the time to fall asleep by 46% over the time necessary without pills. Able based this claim on a sleep study. Persons were asked to record how long they took to fall asleep (`sleep latency’) in minutes, and their average for a week was computed. In the next week these persons received Down-Drowsy and recorded their sleep latency. The following link gives data of the average sleep latency for each of the 73 persons first for the week without pills and then for the week with pills.

Click Assignment 1 to access the data.

Problem:1 Put the data above into a SAS data set containing 3 variables and use Patient, Week 1 and Week 2 as labels. Refer to Data Step Basics , SAS Variables, and Input Statement (List) for assistance. (Use the windows clipboard to transfer the data from the help file to the SAS program window following the dataline statement.) Use input statement with @@.

Problem:2 Use proc sort to arrange the data in increasing order by patient number. Print this sorted data set using proc print.

Problem:3 Use proc means to calculate the mean and standard deviation of the sleep latency times for each individual week.

Problem:4 How you will statistically find out that which drug is more effective. Use statistical call and interpret them.

Please submit your answer or let us know at that if you need assistance or help to solve above problems.

Find and Findone in MongoDB

This post is in continuation of our last post where we have learned basic commands to run on MongoDB. In that post we have seen that we have find method to search something in a document. In today’s post we will see how to extend the find method and use of findone.
Before start, lets create a collection and insert some records (documents) into it:

TechnicalSkill:['SQL Server','MSBI','mongoDB'],
TechnicalSkill:['SQL Server','Sharepoint'],
TechnicalSkill:['SQL Server','MSBI','Informatica'],

1. db.collection.find()
This method selects all the documents which matches the condition, if condition is not specified it returns all documents within the collection.
The find method returns 20 documents default, you need to type it to get more results.
find takes two optional parameters- search condition and fields which would be returned by query.
a. Find all the documents in a collection
This is the simplest form of find method-without any parameters, it will return all documents in collection Employee.
b. Find with select criteria
In this query, we bound our find method to select all documents where JobLocation is Noida. The point to remember here, the above statement return all fields.
c. Find with specify fields
This query return Name and JobLocation all documents. _id is the default field which return with every find method, here 1 denotes to True and 0 denotes to False. We can not mix True and False of fields in one statement, this is possible only with _id, if we run
db.Employee.find({},{_id:false,Name:true,JobLocation:false}) , it will throw an error “You cannot currently mix including and excluding fields. Contact us if this is an issue.”
d. Find with forEach
This query returns all the documents will all fields in arranged format.


e. Find with Limit
This query will return only 2 documents with all the fields.
f. More options with find method
This query returns first document with all the fields (first document means the document which inserted very first time)
This query will return ObjectId of very first document
This will return the time when the objectID was generated by MongoDB.

This method selects all the fields which satisfy the optional search criteria and return only one document. If multiple documents qualify the search criteria then it will return the one document according to the insertion order. Again, similar to find, findone also takes two optional parameters- search condition and fields which would be returned by query.
will return only first row, in this case it returns all the fields of document where name is Deepak Sharma, because that is the very first document which we insert.

The difference between find and findOne comes when we works with embedded documents, like in the above example, if we filter our search on field Name. 
db.Employee.find().Name will not return anything but db.Employee.findOne().Name will return FName and LName of very first record.

Now, in the last of this post we will see how to use java script in mongoDB shell. MongoDB shell support Java script directly- means, write your code and run it direct on mongoDB shell. The easiest example of Java Script use in mongoDB is:
var json=db.Employee.findOne()
We declare a variable name json and assigned the value of it as db.Employee.findOne(), when we write json on mongoDB shell and hit enter then it will execute db.Employee.findOne() and give us the result.


MongoDB- MapReduce Example

In our last post we have learned some basics of Map Reduce in MongoDB. In today’s post we will discuss the same in detail and with an example. As we have already discussed that Map Reduce is two step function- Map and Reduce.
Step 1 – Map
Map step is used to Group the data based on Key-Value. The structure of Map function is:

emit is a special method which must be invoked by every map. It takes two arguments – key: to group by and value: values to be reduced. Map function can call emit 0 or “n” number of times, which depends on the condition given in Map function. Like in below example, emit will run only when status of customer is active:

We have to reference of current document in Map function by using keyword this.

Step 2 – Reduce
Reduce step takes the output of Map as input and aggregate the values and return the result. The basic structure of Reduce function is:

return result;

Reduce step in MongoDB will work only those keys who has array of values, it will not work for a key which has only single value.  Reduce function can invoke multiple times for the same key, in that case the output of one reduce function works as an Input for next reduce.

The next and final step is to call these map and reduce functions in mapReduce function.

Step 3 – mapReduce
The last step is to call mapReduce function with three arguments- Map, Reduce and out. Out specify how the result is return- in form of document or inline.
When we want the result of mapReduce in document then we have to specify the document name, if the document does not exist then mapReduce will create a new document and if document already exists then it will overwrite the values.

When we want to return the result inline, then we can use inline in out.
However mapReduce can take more arguments, we will discuss about them later.

To demonstrate Map Reduce first create a document “Orders” and insert some values into it:

Order_Date:new Date(“Sept 11, 2014″),

Order_Date:new Date(“Sept28, 2014″),

Order_Date:new Date(“Sept 12, 2014″),

Order_Date:new Date(“Aug 12, 2014″),

Order_Date:new Date(“Aug 1, 2014″),
Create Collection









Step 1 – Map

var map1=function(){

MapIn map function we have passed Customer_Name and Order_Quantity, emit takes Customer_Name as key and grouped it on and return array of values-Order_Quantity.

Step 2 – Reduce

var reduce1=function(Customer_Name,arrOrder_Quantity){
return Array.sum(arrOrder_Quantity)

ReduceIn reduce function we have passed key value Customer_Name and it apply SUM aggregate function on arrays returned by map function. In this example we store the aggregated result as arrOrder_Quantity.

Step 3 – mapReduce

a.  mapReduce with Document as Out



It takes map1 and reduce1 as parameters and stores the result of mapReduce in a new document “Dropthis”. The following will be the output when we run :










b. mapReduce with inline as Out

MapReduce_InlineIt gives the aggregated result as inline.

Conclusion: This is introductory post on MapReduce in MongoDB. In examples of this post we have used very simple document which do not have any embedded document or do not have any array of values. We will see later some complex examples of MapReduce.


Map Reduce in MongoDB

Map Reduce is data processing approach which takes high or large volume of data as input and gives useful aggregated result. We can compare this by “Group By” and “Aggregated Functions” in RDBMS.

Map Reduce works on two functions: Map and Reduce. In Map function, each input document (which meets the query condition) arranges as Key-Value pairs- Some Keys have multiple Values. In Map function all these entries are clubbed in an array.

Reduce function takes the output of Map function as input and applies the aggregate functions on it and gives the final result in collection.

All Map Reduce function in MongoDB is Java Script code and run within the MongoD process. Before doing Map Reduce by Java Script, let’s understand this by an example.

Suppose we have a Collection like:

Customer_Name Order_Date Order_Quantity
Deepak 12/07/2014 2
Sachin 13/07/2014 4
Deepak 29/07/2014 6
Abhishek 02/08/2014 3
Sachin 08/08/2014 4


Now, if we want to know how many orders are requested by Customers, then our answer would be:

Customer_Name Order_Quantity
Deepak 8
Sachin 8
Abhishek 3


In SQL, we can write the same as:

SELECT Customer_Name, SUM (Order_Quantity) AS Order_Quantity FROM Customer_Orders
GROUP BY Customer_Names

Now, the same is done by Map Reduce in MongoDB as:

Step 1: Map data: In this step data is arranged in key-values pair. The output looks like:


Step 2: Reduce data: In this step the output of Map function is used as input an aggregated function applies on it. As per our requirement we need sum of orders so, SUM function will be used an aggregated function and result look like:

Deepak [8]
Sachin [8]

I hope you are clear now on Map Reduce, in our next post we will discuss the implementation of it in MongoDB.