Objective: Make your queries run faster on MongoDB by introducing indexes.
Prerequisite: MongoDB Basics Primer Series - Step 1, Step 2, Step 3.
As promised in Step 3 of this series, this step will take you through Indexing in MongoDB. The index concepts in MongoDB are very similar to any other conventional RDBMS database that you might have used. It is the single biggest tunable performance factor in a database and MongoDB is no exception.
An index enables the query to focus on the right set of documents instead of scanning all the documents in a collection.
Upload sample data
For purpose of illustrating indexes in action, let us load some sample data. Start mongod, the MongoDB data server process. Hopefully you might have become adept at running the process by now!! Refer to the earlier posts in this series, if needed or run the following in the Terminal window.
$ mongod --dbpath Data/ --logpath mongod.log --storageEngine wiredTiger --fork
Create the Data directory before running the above command. Once the process is started, download json file, IndianFestivalsHolidays.json, and import the data into MongoDB as follows,
$ mongoimport -d MyDB -c IndianFestivalsHolidays --drop < IndianFestivalsHolidays.json
The data from IndianFestivalsHolidays.json is loaded in collection IndianFestivalsHolidays in database MyDB. Run a count on the collection, there must be 61 documents.
The Data Model
The data contains the festival holidays in India for the calendar year 2016. The loaded documents have the following structure,
Here I am using a GUI based MongoDB client, MongoChef Professional. Remember!! I discussed this in Step 2 of this series. Anyways, if you haven't tried this yet, you can try it now.
The documents have the following fields,
- "_id" - If not provided by the user, as is the case, it is generated by the system. The values are unique.
- "On" - It contains the date.
- "Day" - It contains the day of the week.
- "Month" - It contains the month name corresponding to the date.
- "Occasion" - It is an array containing one or more events/ festivals.
In the Thick of Indexes
Now that you have the data loaded, figure out the currently available indexes in the collection.
In the Mongo shell or MongoChef IntelliShell run,
> db.IndianFestivalsHolidays.getIndexes()
As you see, there is already an index on "_id" field. This is the primary index created automatically by MongoDB and is the reason why you must provide unique values in case you are provisioning the data into the field. Any other index on the documents is called Secondary Index.
Let's look for Occasion "Diwali" in the collection,
> db.IndianFestivalsHolidays.find({Occasion:"Diwali"})
Now run explain on the above query to know how did the query travel to get the required data. Explain provides the execution plan of the query without running the query on the database. It helps to identify the stages where collection scans are being done, whether indexes are being used etc. A well-designed query must have minimal collection scans and must make good use of available indexes.
> db.IndianFestivalsHolidays.find({Occasion:"Diwali"}).explain("executionStats")
On running the above explain, you will get the following output,
Look at the fields marked with the red dots. "stage":"COLLSCAN" indicates that the query is scanning the entire collection to get the required data. "nReturned": NumberInt(1) indicates that 1 document is being returned by the query and finally look at "totalDocsExamined": NumberInt(61), which indicates that the query is examining 61 documents which in fact is the total number of documents that are available in the collection. The difference in values between "nReturned" and "totalDocsExamined" suggests that an Index will help to make the query more efficient. SO LET'S BUILD ONE!! Narrower the difference between "nReturned" and "totalDocsExamined", the better it is.
Since the query focuses on the field "Occasion", the index must be built on this field.
> db.IndianFestivalsHolidays.createIndex({"Occasion":1})
Now check the available indexes in the collection again,
> db.IndianFestivalsHolidays.getIndexes()
As you see, now there are two indexes in the collection. The new index is "Occasion_1" on the field "Occasion". With the new index in place, run the explain on the query again to check the difference which the index has brought to the query efficiency.
> db.IndianFestivalsHolidays.find({Occasion:"Diwali"}).explain("executionStats")
Since the output is a long one, I'll present it in two snapshots,
Here you see "COLLSCAN" changes to "IXSCAN" for field "stage". "IXSCAN" is Index Scan.
As you see here, the total documents examined has dropped from 61 to 1 after the introduction of index. Look at "executionTimeMillis", the value has dropped from an earlier 209 ms to 0. This is why it is said that a right index does wonders to a query!!
Once done, you can drop the index and try creating the ones that satisfy the needs of queries that you wish to run on the database.
> db.IndianFestivalsHolidays.dropIndex({Occasion:1})
Now check the available indexes in the collection.
In case you wish to drop all the indexes in the collection, run
> db.IndianFestivalsHolidays.dropIndexes()
All indexes except the one on the field "_id" get dropped.
With Index basics in place, you are all set to speed up your queries in MongoDB. In the next step of the series, I'll cover index types and properties. Till then tinker on your MongoDB box and try to do something new each day.
Don't forget to subscribe to our newsletters and keep abreast of the latest blogs on this platform.
Recommended for further reading,
Recent Comments