Hadoop Schedulers Tutorial – Job Scheduling in Hadoop

1. Objective – Hadoop Schedulers

Today, in this Hadoop article, we will see Hadoop Schedulers tutorial. Moreover, in this Hadoop Schedulers tutorial, we will first look its meaning. Also, it will include all the types of Schedulers in Hadoop. Along with that, we will also see that when to use each scheduler for simpler and efficient Scheduling. At last, we will discover future development in Hadoop Scheduling and approaches for job scheduling in Hadoop.
So, let’s start the Hadoop Schedulers Tutorial.

Hadoop Schedulers

Hadoop Schedulers Tutorial – Job Scheduling in Hadoop

2. What is Hadoop Schedulers?

Basically, a general-purpose system which enables high-performance processing of data over a set of distributed nodes is what we call Hadoop. Moreover, it is a multitasking system which processes multiple data sets for multiple jobs for multiple users simultaneously.
Earlier, there was a single scheduler which was intermixed with the JobTracker logic, supported by Hadoop. However, for the traditional batch jobs of Hadoop (such as log mining and Web indexing), this implementation was perfect. Yet this implementation was inflexible as well as impossible to tailor.
Well, for scheduling users jobs, previous versions of Hadoop had a very simple way. Basically, by using a Hadoop FIFO scheduler, they ran in order of submission. Further, by using the mapred.job.priority property or the setJobPriority() method on JobClient, it adds the ability to set a job’s priority. The job scheduler selects one with the highest priority when it is choosing the next job to run. Although, priorities do not support preemption, with the FIFO scheduler in Hadoop. Hence by a long-running low priority job that started before the high-priority job was scheduled, a high-priority job can still be blocked.
Additionally, in Hadoop, MapReduce comes along with a choice of schedules, like Hadoop FIFO scheduler, and some multiuser schedulers such as Fair Scheduler in Hadoop as well as the Hadoop Capacity Scheduler.

If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. Types of Hadoop Schedulers

There are several types of schedulers which we use in Hadoop, such as:

Hadoop Schedulers

Types of Hadoop Schedulers

a. Hadoop FIFO scheduler

An original Hadoop Job Scheduling Algorithm which was integrated within the JobTracker is the FIFO. Basically, as a process, a JobTracker pulled jobs from a work queue, that says oldest job first, this is a Hadoop FIFO scheduling. Moreover, this is simpler as well as efficient approach and it had no concept of the priority or size of the job.
You must read the process for Hadoop Installation on multi-node Cluster

b. Hadoop Fair Scheduler

Further, to give every user a fair share of the cluster capacity over time, we use the Fair Scheduler in Hadoop. It gets all of the Hadoop Clusters if a single job is running. Further, free task slots are given to the jobs in such a way as to give each user a fair share of the cluster, as more jobs are submitted.
If a pool has not received its fair share for a certain period of time, then the Hadoop Fair Scheduler supports preemption. Further, the scheduler will kill tasks in pools running over capacity to give the slots to the pool running under capacity.
In addition, it is a “contrib” module. Though, by copying it from Hadoop’s control/fair scheduler directory to the lib directory, place its JAR file on Hadoop’s classpath, to enable it.
Furthermore, just set the mapred.jobtracker.taskScheduler property to:
org.apache.hadoop.mapred.FairScheduler

c. Hadoop Capacity Scheduler

Except for one fact that within each queue, jobs are scheduled using FIFO scheduling in Hadoop (with priorities), this is like the Fair Scheduler. It takes a slightly different approach for multiuser scheduling. Moreover, for each user or an organization, it permits to simulate a separate MapReduce Cluster along with FIFO scheduling.

4. Hadoop Scheduler – Other Approaches 

Instead of the scheduler, Hadoop also offers the concept of provisioning virtual clusters from within larger physical clusters, which we also call Hadoop On Demand (HOD). It uses the Torque resource manager for node allocation on the basis of the requirement of the virtual cluster. The HOD system initializes the system based on the nodes within the virtual cluster, along with allocated nodes, after preparing configuration files, automatically. Also, we can use the HOD virtual cluster in a relatively independent way, after the initialization.
Have a look at the comparison of Hadoop 2.x vs Hadoop 3.x
In other words, an interesting model for deployments of Hadoop clusters within a cloud infrastructure is what we call HOD. It offers greater security as an advantage in that with less sharing of the nodes.

5. When to Use Each Scheduler in Hadoop?

So, we concluded that the capacity scheduler is the right choice while we want to ensure guaranteed access with the potential in order to reuse unused capacity as well as prioritize jobs within queues, while we are running a large Hadoop cluster, along with the multiple clients.
Whereas, when we use both small and large clusters for the same organization with a limited number of workloads, the fair scheduler works well. Also, in a simpler and less configurable way, it offers the means to non-uniformly distribute capacity to pools (of jobs). Furthermore, it can offer fast response times for small jobs mixed with larger jobs (supporting more interactive use models). Hence, it is useful in the presence of diverse jobs.

Hadoop Quiz

6. Future Developments in Hadoop Scheduling

Now, we must see new schedulers developed for unique cluster deployments as the Hadoop scheduler is pluggable. Well, there are two in-process schedulers present such as the adaptive scheduler as well as the learning scheduler. Let’s learn both in detail:
Do you know Hadoop working

  • In order to maintain a level of utilization when presented with a diverse set of workloads, the learning scheduler (MapReduce-1349) helps.
  • And, to adaptively adjust the resources for the job on the basis of its performance as well as business goals is what we call the adaptive scheduler (MapReduce-1380).

So, this was all in Hadoop Schedulers. Hope you like our explanation.

7. Conclusion: Hadoop Schedulers

Hence, we have learned the whole about Hadoop Schedulers in detail. Moreover, we discussed types and approaches in Hadoop Schedulers. Also, we saw when to use Schedulers in Hadoop and future development in Hadoop Scheduling. Hope it helps! You can share your experience of reading the blog with us through comments.
See also – 
Hadoop Administration Books
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.