MongoDB Replication – 2 Major Strategies of Sharding in MongoDB

DataFlair Team

5 years ago

FREE Online Courses: Enroll Now, Thank us Later!

After studying MongoDB Aggregation, it’s time to learn MongoDB Replication and Sharding. Replication instances that maintain the same data set and MongoDB Sharding consists of 3 parts.

Here, we will explore how to set up a replica set, MongoDB Sharded, and Non-Sharded Collections with their workings.

So, are you ready to explore MongoDB Replication and Sharding?

What is MongoDB Replication?

As the name says, MongoDB replication means instances that maintain the same data set. It contains several data bearing nodes and optionally one arbiter node.

Out of all the data bearing nodes, only one of them is a primary node while the others are secondary nodes. A primary node can do all the write operations. A replica set containing primary node is can confirm writes with {w: “majority”}.

The secondary nodes replicate the primary one and apply the operations to their respective dataset. When the data is reflected in the second one it also changes on the primary dataset. If the primary node is not available then the secondary nodes from themselves can elect one of them as a primary node.

Here arbiters do not have a dataset with themselves. Its purpose is to maintain a quorum in a replica set by responding to heartbeat and election requests by other replica members. If your replica set has an even number of members, add an arbiter to obtain a majority of votes in an election for a primary node.

Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!

Automatic Failover

When a primary node does not communicate with other members of the set for a certain period of time i.e. electionTimeoutMills(10 seconds by default) period, then an eligible secondary node calls out for an election.

The clusters present over here try to complete the election as fast as possible so that they can return to the normal operations to be performed.

Here, the replica set cannot process write operation until the election is completed.

How to Set Up a Replica Set in MongoDB?

Here, we will learn how to convert a standalone MongoDB instance to the replica set. Following are the steps to convert:

Shutdown the already running server.
Again start the server by writing the following syntax:

mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

Now, we will take an example to understand it better.

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0

It will start an instance with name rs0, on port 27017
Now connect this to MongoDB instance.
To initiate a new replica set write the command rs.initiate().

Now to add members to replica set we will use the following syntax:

>rs.add(HOST_NAME:PORT)

What is MongoDB Sharding?

A sharded cluster consists of the following components:

shard: They contain the subset of sharded data. Each shard can be deployed as a replica set.
mongos: They act as a query router. They also provide an interface between client applications and a sharded cluster.
config server: They store metadata and configuration settings for the cluster. From MongoDB 3.4 onwards config server must be deployed as a replica set (CSRS).

The following diagram describes the interaction of components within a sharded clusters in MongoDB.

MongoDB Sharding Strategies

There are two types of strategies offers by MongoDB Sharding:

Hashed Sharding
Ranged Sharding

i. Hashed Sharding in MongoDB

It involves computing a hash of the shard key field’s value. Each chunk is assigned a range according to the hash value.

Even though the range of shard keys may be close but their hashed values are not on the same chunk. This kind of sharding facilitates even distribution of data.

ii. Ranged Sharding in MongoDB

It involves dividing data into ranges based on the shard key values. After that, each chunk is assigned some value based on the shard keys.

A range of shard keys who are having very close values is supposed to be present in the same chunk. Its efficiency depends upon the shard key chosen. In the worst case shard keys can result in uneven distribution of data, which results in opposition to some benefits of sharding in MongoDB.

Now we will take an example to study MongoDB Sharding.

Create a database for config server.

mkdir /data/exampledb

Start the MongoDB instance in configuration mode.

mongod –exampledb ExamplesD: 27019

Start the mongos instance by specifying the configuration server.

mongos –exampledb ExamplesD: 27019

From mongo shell connect to mongo’s instance

mongo –host ServerD –port 27017

If we have two servers named S1 and S2, which are to be clustered then use the following command.

sh.addShard("S1:27017")
sh.addShard("S2:27017")

Enable sharding for the database.

sh.enableSharding(Studentdb)

Enable sharding for collection

Sh.shardCollection("db.Student" , { "Studentid" : 1 , "StudentName" : 1})

Sharded and Non-Sharded Collections

A database is a mixture of sharded and unsharded collections in MongoDB.

MongoDB sharded collections are partitioned and distributed in clusters.

MongoDB unsharded collections are stored on a primary shard.

The connection of a Sharded Cluster:

We have to connect the mongos router to interact with any collection in the sharded cluster. It will include both.

Summary

Hence, we have learned about MongoDB replication and sharding. We studied automatic failure, setting the replica set, and strategies for sharding in MongoDB. So, this was all about MongoDB replication and sharding. Hope, you liked our explanation. If you have any query, please post it in the comment section.