Category Archives: Big Data

Connect R with mongoDB

In this post we will see how to connect MongoDB with R. For this first we have to install package “rmongodb “.

Install and load rmongodb package:-
We can install “rmongodb” package by two ways. In first way we can go to Packages from R console and Install packages and choose “rmongodb”. (R Console–>Packages–>Install Package (s)–>rmongodb)

Install Package












We can also install it  by writing command on R console-

After it installed, you have to load it for use. To load the install package use below command-


Connect R to MongoDB

First we need to make a connection with MongoDB. If we run the below command without any parameter, it will connect MongoDB on localhost.

mongo <- mongo.create()
The same command can be used by taking host, username, password and database as parameter.
host <- “localhost:27017
username <- “”

password <- “”

db <- “test
Above command now looks like-
mongo <- mongo.create(host=host , db=db, username=username, password=password)

We can also pass the hardcode values-
mongo <- mongo.create(host=”localhost:27017” , db=”Test”, username=””, password=””)


When you type “mongo” in R console, it will give you all the values which are used to connect mongoDB. In my case it returns:

> mongo
[1] 0
<pointer: 0x02f89e58>
[1] “mongo”
[1] “localhost:27017″
[1] “”
[1] “”
[1] “”
[1] “Test”
[1] 0

Below are some basic commands:

To check whether we are connected with MongoDB or not we can execute the below command:



[1] TRUE

To get all databases:

> mongo.get.databases(mongo)

[1] “R”    “test”

To get all the collections in a specific database:
> db <- “test”
> mongo.get.database.collections(mongo, db)


[1] “test.user”               “test.Student”
[3] “test.Track”              “test.ExampleMapReduce”
[5] “test.map_reduce_example” “test.Orders”
[7] “test.Dropthis”           “test.Employee”
[9] “test.Testdelete”         “test.categories”
[11] “test.Book1″              “test.Book2″
[13] “test.Publisher”          “test.numbers”
[15] “test.DemoIndex”          “test.ttt”
[17] “test.Items”

To find something in specific collection:
> JobLocation<-,”test.Employee”,’{“JobLocation”:”Gurgaon”}’)
> JobLocation

_id : 7          53efb77ea6f48f77e8c8f3d4
Name : 3
FName : 2        Suresh
LName : 2        Chaudhary

TechnicalSkill : 4
0 : 2    SQL Server
1 : 2    MSBI
2 : 2    Informatica

Experience : 2   8yrs
JobLocation : 2          Gurgaon

Note: The above command will return BSON object. And we can not use BSON object directly in R. To convert it in R object we can write:
> JobLocation<-,”test.Employee”,’{“JobLocation”:”Gurgaon”}’)

{ $oid : “53efb77ea6f48f77e8c8f3d4″ }

[1] “Suresh”
[1] “Chaudhary”
[1] “SQL Server”  “MSBI”        “Informatica”
[1] “8yrs”
[1] “Gurgaon”

To get all the data, we should use find.all commad.

> JobLocation<- mongo.find.all(mongo,”test.Employee”,’{“JobLocation”:”Noida”}’)
> JobLocation
[1] “53efb77ea6f48f77e8c8f3d0″
[1] “Deepak”
[1] “Sharma”
[1] “SQL Server” “MSBI”       “mongoDB”
[1] “8yrs”
[1] “Noida”

[1] “53efb77ea6f48f77e8c8f3d2″
[1] “Abhishek”
[1] “Perl”    “C++”     “Testing”
[1] “8yrs”
[1] “Noida”

Introduction of Indexes in MongoDB

In this introductory article on Indexes in MongoDB we will learn how indexes works in MongoDB. Like in other databases in MongoDB also indexes play very important role. By using indexes we can optimize our query performance but on the other side it will hamper the performance if not used properly. Like a traditional example of indexes, in any book index help us to find the content/topic very easily. Same happen with MongoDB-Indexes help query engine by reducing time to fetch documents. In MongoDB we can create indexes by two types:
1. Single Key Index
2. Compound Key Index

1. Single Key Index: With this type of index each value of index corresponds to a single value from documents. Best example of this is default _id field in each document.
2. Compound Key Index: With this type of index we create index on combination of keys. That give us benefit over single key index when we search our documents on the basis of mixed condition.

To demonstrate this, let us create a collection by executing below java script on MongoDB shell:
for(i=0; i<500; i++) {{num: i}); }
To check the above syntax execute find():


If you want to see more results you can type “it” and press enter.

Lets try to find some more records:
Will return only one value: { “_id” : ObjectId(“5415f44f47919b61db429589″), “num” : 30 }

Will return 4 rows with num value: 31,32,33 and 34.

{ “_id” : ObjectId(“5415f44f47919b61db42958a”), “num” : 31 }
{ “_id” : ObjectId(“5415f44f47919b61db42958b”), “num” : 32 }
{ “_id” : ObjectId(“5415f44f47919b61db42958c”), “num” : 33 }
{ “_id” : ObjectId(“5415f44f47919b61db42958d”), “num” : 34 }

Now, at this point lets deep inside in Indexes and see how they works. In other relational databases we have some system commands/stored procedures which give us execution plan of query where we can see how our index is being used by query engine to find the result. Likely in MongoDB we have explain() command which gives us ides how index works. Lets try to modify the above query and check the use of explain():

Output of above query is as below:

“cursor” : “BasicCursor”,
“isMultiKey” : false,
“n” : 4,
“nscannedObjects” : 500,
“nscanned” : 500,
“nscannedObjectsAllPlans” : 500,
“nscannedAllPlans” : 500,
“scanAndOrder” : false,
“indexOnly” : false,
“nYields” : 0,
“nChunkSkips” : 0,
“millis” : 0,
“indexBounds” : {
“server” : “Deepak-PC:27017″

The main things in that result are marked in Bold font.


Any one can be surprised to see that result, to search only 4 results (n) query engine scanned (nscanned) all 500 documents. The cursor type Basic Cursor means that this query has not used any index while executing.  In real scenario number of documents in collection much larger then we are using in our example. So, if that is the case then query engine will take lots of time to execute a very simple query.

Okay, we have a solution of this problem- We can make a index on our collection. ensureIndex() method is used to create an index on collection in MongoDB. In our example we have only one column- num. Below command will create an Index on num column:

That  command ensure that an ascending index should be built on num column for all documents in DemoIndex collection.  Index has created or not? Below command will give us the answer:



Above command will show you all the indexes which have been created on the collection. We have the default index on _id and the one which we have just created on num. Now, again run below command and see the difference:


Now, it is very clear that query engine use to get the result by using the index created on num and scanned only 4 pages now. If we have more number of documents then we can see the significance difference in “millis” (query execution time in milliseconds) between the approaches we have followed.

In this article we have seen single key index only, in coming posts we will see how to deal with compound key index and dig some more into indexes in MongoDB.

Find and Findone in MongoDB

This post is in continuation of our last post where we have learned basic commands to run on MongoDB. In that post we have seen that we have find method to search something in a document. In today’s post we will see how to extend the find method and use of findone.
Before start, lets create a collection and insert some records (documents) into it:

TechnicalSkill:['SQL Server','MSBI','mongoDB'],
TechnicalSkill:['SQL Server','Sharepoint'],
TechnicalSkill:['SQL Server','MSBI','Informatica'],

1. db.collection.find()
This method selects all the documents which matches the condition, if condition is not specified it returns all documents within the collection.
The find method returns 20 documents default, you need to type it to get more results.
find takes two optional parameters- search condition and fields which would be returned by query.
a. Find all the documents in a collection
This is the simplest form of find method-without any parameters, it will return all documents in collection Employee.
b. Find with select criteria
In this query, we bound our find method to select all documents where JobLocation is Noida. The point to remember here, the above statement return all fields.
c. Find with specify fields
This query return Name and JobLocation all documents. _id is the default field which return with every find method, here 1 denotes to True and 0 denotes to False. We can not mix True and False of fields in one statement, this is possible only with _id, if we run
db.Employee.find({},{_id:false,Name:true,JobLocation:false}) , it will throw an error “You cannot currently mix including and excluding fields. Contact us if this is an issue.”
d. Find with forEach
This query returns all the documents will all fields in arranged format.


e. Find with Limit
This query will return only 2 documents with all the fields.
f. More options with find method
This query returns first document with all the fields (first document means the document which inserted very first time)
This query will return ObjectId of very first document
This will return the time when the objectID was generated by MongoDB.

This method selects all the fields which satisfy the optional search criteria and return only one document. If multiple documents qualify the search criteria then it will return the one document according to the insertion order. Again, similar to find, findone also takes two optional parameters- search condition and fields which would be returned by query.
will return only first row, in this case it returns all the fields of document where name is Deepak Sharma, because that is the very first document which we insert.

The difference between find and findOne comes when we works with embedded documents, like in the above example, if we filter our search on field Name. 
db.Employee.find().Name will not return anything but db.Employee.findOne().Name will return FName and LName of very first record.

Now, in the last of this post we will see how to use java script in mongoDB shell. MongoDB shell support Java script directly- means, write your code and run it direct on mongoDB shell. The easiest example of Java Script use in mongoDB is:
var json=db.Employee.findOne()
We declare a variable name json and assigned the value of it as db.Employee.findOne(), when we write json on mongoDB shell and hit enter then it will execute db.Employee.findOne() and give us the result.


MongoDB- MapReduce Example

In our last post we have learned some basics of Map Reduce in MongoDB. In today’s post we will discuss the same in detail and with an example. As we have already discussed that Map Reduce is two step function- Map and Reduce.
Step 1 – Map
Map step is used to Group the data based on Key-Value. The structure of Map function is:

emit is a special method which must be invoked by every map. It takes two arguments – key: to group by and value: values to be reduced. Map function can call emit 0 or “n” number of times, which depends on the condition given in Map function. Like in below example, emit will run only when status of customer is active:

We have to reference of current document in Map function by using keyword this.

Step 2 – Reduce
Reduce step takes the output of Map as input and aggregate the values and return the result. The basic structure of Reduce function is:

return result;

Reduce step in MongoDB will work only those keys who has array of values, it will not work for a key which has only single value.  Reduce function can invoke multiple times for the same key, in that case the output of one reduce function works as an Input for next reduce.

The next and final step is to call these map and reduce functions in mapReduce function.

Step 3 – mapReduce
The last step is to call mapReduce function with three arguments- Map, Reduce and out. Out specify how the result is return- in form of document or inline.
When we want the result of mapReduce in document then we have to specify the document name, if the document does not exist then mapReduce will create a new document and if document already exists then it will overwrite the values.

When we want to return the result inline, then we can use inline in out.
However mapReduce can take more arguments, we will discuss about them later.

To demonstrate Map Reduce first create a document “Orders” and insert some values into it:

Order_Date:new Date(“Sept 11, 2014″),

Order_Date:new Date(“Sept28, 2014″),

Order_Date:new Date(“Sept 12, 2014″),

Order_Date:new Date(“Aug 12, 2014″),

Order_Date:new Date(“Aug 1, 2014″),
Create Collection









Step 1 – Map

var map1=function(){

MapIn map function we have passed Customer_Name and Order_Quantity, emit takes Customer_Name as key and grouped it on and return array of values-Order_Quantity.

Step 2 – Reduce

var reduce1=function(Customer_Name,arrOrder_Quantity){
return Array.sum(arrOrder_Quantity)

ReduceIn reduce function we have passed key value Customer_Name and it apply SUM aggregate function on arrays returned by map function. In this example we store the aggregated result as arrOrder_Quantity.

Step 3 – mapReduce

a.  mapReduce with Document as Out



It takes map1 and reduce1 as parameters and stores the result of mapReduce in a new document “Dropthis”. The following will be the output when we run :










b. mapReduce with inline as Out

MapReduce_InlineIt gives the aggregated result as inline.

Conclusion: This is introductory post on MapReduce in MongoDB. In examples of this post we have used very simple document which do not have any embedded document or do not have any array of values. We will see later some complex examples of MapReduce.


Map Reduce in MongoDB

Map Reduce is data processing approach which takes high or large volume of data as input and gives useful aggregated result. We can compare this by “Group By” and “Aggregated Functions” in RDBMS.

Map Reduce works on two functions: Map and Reduce. In Map function, each input document (which meets the query condition) arranges as Key-Value pairs- Some Keys have multiple Values. In Map function all these entries are clubbed in an array.

Reduce function takes the output of Map function as input and applies the aggregate functions on it and gives the final result in collection.

All Map Reduce function in MongoDB is Java Script code and run within the MongoD process. Before doing Map Reduce by Java Script, let’s understand this by an example.

Suppose we have a Collection like:

Customer_Name Order_Date Order_Quantity
Deepak 12/07/2014 2
Sachin 13/07/2014 4
Deepak 29/07/2014 6
Abhishek 02/08/2014 3
Sachin 08/08/2014 4


Now, if we want to know how many orders are requested by Customers, then our answer would be:

Customer_Name Order_Quantity
Deepak 8
Sachin 8
Abhishek 3


In SQL, we can write the same as:

SELECT Customer_Name, SUM (Order_Quantity) AS Order_Quantity FROM Customer_Orders
GROUP BY Customer_Names

Now, the same is done by Map Reduce in MongoDB as:

Step 1: Map data: In this step data is arranged in key-values pair. The output looks like:


Step 2: Reduce data: In this step the output of Map function is used as input an aggregated function applies on it. As per our requirement we need sum of orders so, SUM function will be used an aggregated function and result look like:

Deepak [8]
Sachin [8]

I hope you are clear now on Map Reduce, in our next post we will discuss the implementation of it in MongoDB.