Connect R with mongoDB

In this post we will see how to connect MongoDB with R. For this first we have to install package “rmongodb “.

Install and load rmongodb package:-
We can install “rmongodb” package by two ways. In first way we can go to Packages from R console and Install packages and choose “rmongodb”. (R Console–>Packages–>Install Package (s)–>rmongodb)

Install Package

 

 

 

 

 

 

 

 

 

 

 

We can also install it  by writing command on R console-
install.packages(“rmongodb”)

After it installed, you have to load it for use. To load the install package use below command-

library(rmongodb)

Connect R to MongoDB

First we need to make a connection with MongoDB. If we run the below command without any parameter, it will connect MongoDB on localhost.

mongo <- mongo.create()
The same command can be used by taking host, username, password and database as parameter.
host <- “localhost:27017
username <- “”

password <- “”

db <- “test
Above command now looks like-
mongo <- mongo.create(host=host , db=db, username=username, password=password)

We can also pass the hardcode values-
mongo <- mongo.create(host=”localhost:27017” , db=”Test”, username=””, password=””)

Connection

When you type “mongo” in R console, it will give you all the values which are used to connect mongoDB. In my case it returns:

> mongo
[1] 0
attr(,”mongo”)
<pointer: 0x02f89e58>
attr(,”class”)
[1] “mongo”
attr(,”host”)
[1] “localhost:27017″
attr(,”name”)
[1] “”
attr(,”username”)
[1] “”
attr(,”password”)
[1] “”
attr(,”db”)
[1] “Test”
attr(,”timeout”)
[1] 0

Below are some basic commands:

To check whether we are connected with MongoDB or not we can execute the below command:

> mongo.is.connected(mongo)

Result:

[1] TRUE

To get all databases:

> mongo.get.databases(mongo)
Result:

[1] “R”    “test”

To get all the collections in a specific database:
> db <- “test”
> mongo.get.database.collections(mongo, db)

Result:

[1] “test.user”               “test.Student”
[3] “test.Track”              “test.ExampleMapReduce”
[5] “test.map_reduce_example” “test.Orders”
[7] “test.Dropthis”           “test.Employee”
[9] “test.Testdelete”         “test.categories”
[11] “test.Book1″              “test.Book2″
[13] “test.Publisher”          “test.numbers”
[15] “test.DemoIndex”          “test.ttt”
[17] “test.Items”

To find something in specific collection:
> JobLocation<- mongo.find.one(mongo,”test.Employee”,’{“JobLocation”:”Gurgaon”}’)
> JobLocation
Result:

_id : 7          53efb77ea6f48f77e8c8f3d4
Name : 3
FName : 2        Suresh
LName : 2        Chaudhary

TechnicalSkill : 4
0 : 2    SQL Server
1 : 2    MSBI
2 : 2    Informatica

Experience : 2   8yrs
JobLocation : 2          Gurgaon

Note: The above command will return BSON object. And we can not use BSON object directly in R. To convert it in R object we can write:
> JobLocation<- mongo.find.one(mongo,”test.Employee”,’{“JobLocation”:”Gurgaon”}’)
> mongo.bson.to.list(JobLocation)
Result:

$`_id`
{ $oid : “53efb77ea6f48f77e8c8f3d4″ }

$Name
$Name$FName
[1] “Suresh”
$Name$LName
[1] “Chaudhary”
$TechnicalSkill
[1] “SQL Server”  “MSBI”        “Informatica”
$Experience
[1] “8yrs”
$JobLocation
[1] “Gurgaon”

To get all the data, we should use find.all commad.

> JobLocation<- mongo.find.all(mongo,”test.Employee”,’{“JobLocation”:”Noida”}’)
> JobLocation
Result:
[[1]]
[[1]]$`_id`
[1] “53efb77ea6f48f77e8c8f3d0″
[[1]]$Name
[[1]]$Name$FName
[1] “Deepak”
[[1]]$Name$LName
[1] “Sharma”
[[1]]$TechnicalSkill
[1] “SQL Server” “MSBI”       “mongoDB”
[[1]]$Experience
[1] “8yrs”
[[1]]$JobLocation|
[1] “Noida”

[[2]]
[[2]]$`_id`
[1] “53efb77ea6f48f77e8c8f3d2″
[[2]]$Name
[[2]]$Name$FName
[1] “Abhishek”
[[2]]$TechnicalSkill
[1] “Perl”    “C++”     “Testing”
[[2]]$Experience
[1] “8yrs”
[[2]]$JobLocation
[1] “Noida”

Big Data Analytics and Hadoop Opportunities

big data hadoop

With organizations extending their operations manifold and the information revolution growing in volume with every passing minute; there is an increasing need to manage and extract precise and useful knowledge from this vast information wellspring. It is this growing need that has given birth to the concept of ‘Big Data’ analytics.

Understanding Big Data

Big Data refers to the huge chunks of data containing valuable information on a variety of subjects. This information may be fully organized, semi-organized or completely unorganized. However, it makes it no less valuable a source for extracting the most sought after knowledge and inputs that are of utmost importance to businesses, defence services, markets, financial organizations, analysts, economists, governments, or even researchers and scientists.

Significance of Big Data Analytics

Big Data & Smarter

The process of analysing these humongous data sets to unravel the hidden trends, patterns, correlations, indications, customer preferences and much more is called – Big Data Analytics. Organizations depend upon these analytical findings to draw-out improved strategies, opportunities, defence mechanisms, enhanced customer service, as well as, ways to improve their operational efficiency and gain an edge over existing or future competition.

Big Data Analytics is aimed at enabling companies to get the right information, through the right source, at the right time; in order to ensure the right action.

Why Hadoop?

Big Data Ecosystem

Most organizations today are looking for new tools that will allow them to not only collect big data, but also process it for in-depth analysis and derive new knowledge that can be used to their advantage.

One such effective, new, Java-based, technological tool that aids Big Data analytics is Hadoop. This programming framework from Apache™ is an open source software, that enables analysts to process large data sets, spread across distributed commodity servers.

Some reasons that make Hadoop a popular choice are:

  • Hadoop allows the flexibility of easy and quick scalability from a single server usage to multiple machines.
  • Changing the very dynamics and economics involved in mass computing; it emerges as a cost effective, flexible and highly fault tolerant tool.
  • Some of the top companies today are using Hadoop for Big data; creating tremendous opportunities of well paying jobs in Big Data for trained Hadoop professionals.
  • Hadoop architects, Hadoop developers, and Hadoop testers are in great demand in the market today.
  • Industry survey reports claim that the salary of Big Data Hadoop Professionals in India is anywhere above 6 lakh per annum and is expected to grow almost at the rate of 25 percent yearly.

Identifying the growing demand and amazing job prospects for Big Data Hadoop trained professionals; we at Analytic Square have come up with some of the best programs in Big Data Analytic Training in Delhi.

Big Data Analytics Training

Big Data-FutureOur well crafted Online Big Data Hadoop Training programs and modules are designed with care for those looking to carve a niche for themselves in the world of Big Data Analytic with Hadoop.

 

If managing, interpreting and cracking big data drives you; Analytic Square is the perfect place for you.

With high quality learning content, well trained faculty, to the point doubt clarification, real time projects, and ample training sessions that help you clear the certificate exam; the Analytic Square Institute offers professional level Online Big Data Analytic training to help you step forward with confidence and advance your career in this field. Both in terms of time, as well as, money invested; we offer the most competitive Big Data Hadoop Training in Delhi.

 So go ahead and contact us to join today. We look forward to embark with you on your journey of success!Hadoop Distribution

 

 

Introduction of Indexes in MongoDB

In this introductory article on Indexes in MongoDB we will learn how indexes works in MongoDB. Like in other databases in MongoDB also indexes play very important role. By using indexes we can optimize our query performance but on the other side it will hamper the performance if not used properly. Like a traditional example of indexes, in any book index help us to find the content/topic very easily. Same happen with MongoDB-Indexes help query engine by reducing time to fetch documents. In MongoDB we can create indexes by two types:
1. Single Key Index
2. Compound Key Index

1. Single Key Index: With this type of index each value of index corresponds to a single value from documents. Best example of this is default _id field in each document.
2. Compound Key Index: With this type of index we create index on combination of keys. That give us benefit over single key index when we search our documents on the basis of mixed condition.

To demonstrate this, let us create a collection by executing below java script on MongoDB shell:
for(i=0; i<500; i++) { db.DemoIndex.save({num: i}); }
To check the above syntax execute find():
db.DemoIndex.find()

Indexes-1

If you want to see more results you can type “it” and press enter.

Lets try to find some more records:
db.DemoIndex.find({num:30})
Will return only one value: { “_id” : ObjectId(“5415f44f47919b61db429589″), “num” : 30 }

db.DemoIndex.find({num:{“$gt”:30,”$lt”:35}})
Will return 4 rows with num value: 31,32,33 and 34.

{ “_id” : ObjectId(“5415f44f47919b61db42958a”), “num” : 31 }
{ “_id” : ObjectId(“5415f44f47919b61db42958b”), “num” : 32 }
{ “_id” : ObjectId(“5415f44f47919b61db42958c”), “num” : 33 }
{ “_id” : ObjectId(“5415f44f47919b61db42958d”), “num” : 34 }

Now, at this point lets deep inside in Indexes and see how they works. In other relational databases we have some system commands/stored procedures which give us execution plan of query where we can see how our index is being used by query engine to find the result. Likely in MongoDB we have explain() command which gives us ides how index works. Lets try to modify the above query and check the use of explain():

db.DemoIndex.find({num:{“$gt”:30,”$lt”:35}}).explain()
Output of above query is as below:

{
“cursor” : “BasicCursor”,
“isMultiKey” : false,
“n” : 4,
“nscannedObjects” : 500,
“nscanned” : 500,
“nscannedObjectsAllPlans” : 500,
“nscannedAllPlans” : 500,
“scanAndOrder” : false,
“indexOnly” : false,
“nYields” : 0,
“nChunkSkips” : 0,
“millis” : 0,
“indexBounds” : {
},
“server” : “Deepak-PC:27017″
}

The main things in that result are marked in Bold font.

Indexes-2

Any one can be surprised to see that result, to search only 4 results (n) query engine scanned (nscanned) all 500 documents. The cursor type Basic Cursor means that this query has not used any index while executing.  In real scenario number of documents in collection much larger then we are using in our example. So, if that is the case then query engine will take lots of time to execute a very simple query.

Okay, we have a solution of this problem- We can make a index on our collection. ensureIndex() method is used to create an index on collection in MongoDB. In our example we have only one column- num. Below command will create an Index on num column:

db.DemoIndex.ensureIndex({num:1})
That  command ensure that an ascending index should be built on num column for all documents in DemoIndex collection.  Index has created or not? Below command will give us the answer:

db.DemoIndex.getIndexes()

Indexes-3

Above command will show you all the indexes which have been created on the collection. We have the default index on _id and the one which we have just created on num. Now, again run below command and see the difference:
db.DemoIndex.find({num:{“$gt”:30,”$lt”:35}}).explain()

Indexes-4

Now, it is very clear that query engine use to get the result by using the index created on num and scanned only 4 pages now. If we have more number of documents then we can see the significance difference in “millis” (query execution time in milliseconds) between the approaches we have followed.

In this article we have seen single key index only, in coming posts we will see how to deal with compound key index and dig some more into indexes in MongoDB.

SAS Project-2

In our ongoing effort, here we have added another SAS project where you will get glimpse of analytics also for your practice purpose.

A study conducted to measure whether a product which has reduced the cost are equally preferred to the current product. A sample of 200 people were asked their preference on the pair of samples Coded 27 and 45. One half of the group tasted sample 27 first and 45 second, the other half tasted the samples in the reverse order. Each people  are also asked to rate six qualities of these sample on a 0 to 6 scale (0=no response,1=excellent,… ,6 poor).  The preferences of each people were recorded as: 1, prefers first sample tasted 2, prefers second sample tasted 3, no preference. Please access the data by click  Assignment2.

The first variable is the people number, the second variable is number of the sample tasted first, the third variable is the number of the sample tasted second, the fourth variable is the six ratings for the first sample tasted, the fifth variable is the six ratings of the second sample tasted, and the last variable is the preference. You need to solve below business problems and give your comments and suggestions.

1. How many people preferred sample 27? You can use if-then statements to prepare the data set and use  n option in proc means. Use proc freq.

2. Use chi-square tests to compare the distribution of the rankings of the two products for each of the six aspects of quality. The chi-square test can run using proc freq. The information on column-wise input and do loops may be useful. Are the two products interchangeable? Interpret the output and express in statement for non technical readers.

 

SAS Projects- Assignment 1

Dear reader, we are going to add some practice assignment on different tools and techniques. We would encourage you to practice on these topics.

Assignment-1

In an  advertisements a company claimed that taking its product Down-Drowsy reduced the time to fall asleep by 46% over the time necessary without pills. Able based this claim on a sleep study. Persons were asked to record how long they took to fall asleep (`sleep latency’) in minutes, and their average for a week was computed. In the next week these persons received Down-Drowsy and recorded their sleep latency. The following link gives data of the average sleep latency for each of the 73 persons first for the week without pills and then for the week with pills.

Click Assignment 1 to access the data.

Problem:1 Put the data above into a SAS data set containing 3 variables and use Patient, Week 1 and Week 2 as labels. Refer to Data Step Basics , SAS Variables, and Input Statement (List) for assistance. (Use the windows clipboard to transfer the data from the help file to the SAS program window following the dataline statement.) Use input statement with @@.

Problem:2 Use proc sort to arrange the data in increasing order by patient number. Print this sorted data set using proc print.

Problem:3 Use proc means to calculate the mean and standard deviation of the sleep latency times for each individual week.

Problem:4 How you will statistically find out that which drug is more effective. Use statistical call and interpret them.

Please submit your answer or let us know at analyticsquare@gmail.com that if you need assistance or help to solve above problems.