Speculative Execution in Hadoop MapReduce

1. Objective

In this Big data Hadoop tutorial, we are going to learn Hadoop speculative execution. Apache Hadoop does not fix or diagnose slow-running tasks. Instead, it tries to detect when a task is running slower than expected and launches another, an equivalent task as a backup (the backup task is called as speculative task). This process is called speculative execution in Hadoop.

In this tutorial we will discuss speculative execution – A key feature of Hadoop that improves job efficiency, what is the need of speculative execution in Hadoop, is Speculative execution always helpful or do we need to turn it off and how can we disable speculative execution in Hadoop.

Speculative Execution in Hadoop MapReduce

Speculative Execution in Hadoop MapReduce

2. What is Speculative Execution in Hadoop?

Let us first understand what is Hadoop speculative execution?

Speculative Execution in Spark

Speculative Execution in Spark

In Hadoop, MapReduce breaks jobs into tasks and these tasks run parallel rather than sequential, thus reduces overall execution time. This model of execution is sensitive to slow tasks (even if they are few in numbers) as they slow down the overall execution of a job.
There may be various reasons for the slowdown of tasks, including hardware degradation or software misconfiguration, but it may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. This is called speculative execution in Hadoop. These backup tasks are called Speculative tasks in Hadoop.

If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. How Speculative Execution works in Hadoop?

Let us now see Hadoop speculative execution process.
Firstly all the tasks for the job are launched in Hadoop MapReduce. The speculative tasks are launched for those tasks that have been running for some time (at least one minute) and have not made any much progress, on average, as compared with other tasks from the job. The speculative task is killed if the original task completes before the speculative task, on the other hand, the original task is killed if the speculative task finishes before it.

4. Is Speculative Execution Beneficial?

Hadoop MapReduce Speculative execution is beneficial in some cases because in a Hadoop cluster with 100s of nodes, problems like hardware failure or network congestion are common and running parallel or duplicate task would be better since we won’t be waiting for the task in the problem to complete.
But if two duplicate tasks are launched at about same time, it will be a wastage of cluster resources.

5. How to Enable or Disable Speculative Execution?

Speculative execution is a MapReduce job optimization technique in Hadoop that is enabled by default. You can disable speculative execution for mappers and reducers in mapred-site.xml as shown below:

<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>

Hadoop Quiz

6. What is the need to turn off Speculative Execution?

The main work of speculative execution is to reduce the job execution time; however, the clustering efficiency is affected due to duplicate tasks. Since in speculative execution redundant tasks are being executed, thus this can reduce overall throughput. For this reason, some cluster administrators prefer to turn off the speculative execution in Hadoop.

7. Conclusion

In conclusion, we can say that Speculative execution is the key feature of Hadoop that improves job efficiency. Hence, it reduces the job execution time. Hope you liked this blog, If you have any question regarding Speculative Execution in Hadoop, so please let us know by leaving a comment in a section given below. We will be glad to solve your queries.
See Also-

Reference for MapReduce

No Responses

  1. Rajesh says:

    It is possible to get what are jobs executed by this speculative?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.