online learning, blog, anil singh

«

»

Apr 09

MongoDB Basics Primer Series - Step 4 (Indexes)

Objective: Make your queries run faster on MongoDB by introducing indexes.

Prerequisite: MongoDB Basics Primer Series - Step 1, Step 2, Step 3.

 

As promised in Step 3 of this series, this step will take you through Indexing in MongoDB. The index concepts in MongoDB are very similar to any other conventional RDBMS database that you might have used. It is the single biggest tunable performance factor in a database and MongoDB is no exception.

An index enables the query to focus on the right set of documents instead of scanning all the documents in a collection.

 

Upload sample data

For purpose of illustrating indexes in action, let us load some sample data. Start mongod, the MongoDB data server process. Hopefully you might have become adept at running the process by now!! Refer to the earlier posts in this series, if needed or run the following in the Terminal window.

$ mongod --dbpath Data/ --logpath mongod.log --storageEngine wiredTiger --fork

Create the Data directory before running the above command. Once the process is started, download json file, IndianFestivalsHolidays.json, and import the data into MongoDB as follows,

mongoimport -d MyDB -c IndianFestivalsHolidays --drop < IndianFestivalsHolidays.json

The data from IndianFestivalsHolidays.json is loaded in collection IndianFestivalsHolidays in database MyDB. Run a count on the collection, there must be 61 documents.

 

The Data Model

The data contains the  festival holidays in India for the calendar year 2016. The loaded documents have the following structure,

Document structure

Fig. 1. Loaded Sample Data

Here I am using a GUI based MongoDB client, MongoChef Professional. Remember!! I discussed this in Step 2 of this series. Anyways, if you haven't tried this yet, you can try it now.

The documents have the following fields,

  1. "_id" - If not provided by the user, as is the case, it is generated by the system. The values are unique.
  2. "On" - It contains the date.
  3. "Day" - It contains the day of the week.
  4. "Month" - It contains the month name corresponding to the date.
  5. "Occasion" - It is an array containing one or more events/ festivals.

 

In the Thick of Indexes

Now that you have the data loaded, figure out the currently available indexes in the collection.

In the Mongo shell or MongoChef IntelliShell run,

> db.IndianFestivalsHolidays.getIndexes()

MongoDB primary index

Fig. 2. Existing Indexes

As you see, there is already an index on "_id" field. This is the primary index created automatically by MongoDB and is the reason why you must provide unique values in case you are provisioning the data into the field. Any other index on the documents is called Secondary Index.

Let's look for Occasion "Diwali" in the collection,

db.IndianFestivalsHolidays.find({Occasion:"Diwali"})

Now run explain on the above query to know how did the query travel to get the required data. Explain provides the execution plan of the query without running the query on the database. It helps to identify the stages where collection scans are being done, whether indexes are being used etc. A well-designed query must have minimal collection scans and must make good use of available indexes.

> db.IndianFestivalsHolidays.find({Occasion:"Diwali"}).explain("executionStats")

On running the above explain, you will get the following output,

Query execution plan in MongoDB

Fig. 3. Query Execution Plan

Look at the fields marked with the red dots. "stage":"COLLSCAN" indicates that the query is scanning the entire collection to get the required data. "nReturned": NumberInt(1) indicates that 1 document is being returned by the query and finally look at "totalDocsExamined": NumberInt(61), which indicates that the query is examining 61 documents which in fact is the total number of documents that are available in the collection. The difference in values between "nReturned" and "totalDocsExamined" suggests that an Index will help to make the query more efficient. SO LET'S BUILD ONE!! Narrower the difference between  "nReturned" and "totalDocsExamined", the better it is.

Since the query focuses on the field "Occasion", the index must be built on this field.

> db.IndianFestivalsHolidays.createIndex({"Occasion":1})

Now check the available indexes in the collection again,

> db.IndianFestivalsHolidays.getIndexes()

Secondary index in MongoDB

Fig. 4. New index on field "Occasion"

As you see, now there are two indexes in the collection. The new index is "Occasion_1" on the field "Occasion". With the new index in place, run the explain on the query again to check the difference which the index has brought to the query efficiency.

> db.IndianFestivalsHolidays.find({Occasion:"Diwali"}).explain("executionStats")

Since the output is a long one, I'll present it in two snapshots,

Index scan in MongoDB

Fig. 5. Index scan introduced

Here you see "COLLSCAN" changes to "IXSCAN" for field "stage". "IXSCAN" is Index Scan.

MongoDB query execution plan with index scan

Fig. 6. Query statistics

As you see here, the total documents examined has dropped from 61 to 1 after the introduction of index. Look at "executionTimeMillis", the value has dropped from an earlier 209 ms to 0. This is why it is said that a right index does wonders to a query!!

Once done, you can drop the index and try creating the ones that satisfy the needs of queries that you wish to run on the database.

> db.IndianFestivalsHolidays.dropIndex({Occasion:1})

Now check the available indexes in the collection.

In case you wish to drop all the indexes in the collection, run

> db.IndianFestivalsHolidays.dropIndexes()

All indexes except the one on the field "_id" get dropped.

 

With Index basics in place, you are all set to speed up your queries in MongoDB. In the next step of the series, I'll cover index types and properties. Till then tinker on your MongoDB box and try to do something new each day.

 

Don't forget to subscribe to our newsletters and keep abreast of the latest blogs on this platform.

 

 

Recommended for further reading,

  1. Introduction to Indexes in MongoDB
  2. Running explain in MongoDB

 

 

EmailWhatsAppTwitterLinkedInGoogle+FacebookShare

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>