Speculative Execution in Hadoop MapReduce
Keeping you updated with latest technology trends, Join DataFlair on Telegram
In this Big data Hadoop tutorial, we are going to learn Hadoop speculative execution. Apache Hadoop does not fix or diagnose slow-running tasks. Instead, it tries to detect when a task is running slower than expected and launches another, an equivalent task as a backup (the backup task is called as speculative task). This process is called speculative execution in Hadoop.
In this tutorial we will discuss speculative execution – A key feature of Hadoop that improves job efficiency, what is the need of speculative execution in Hadoop, is Speculative execution always helpful or do we need to turn it off and how can we disable speculative execution in Hadoop.
2. What is Speculative Execution in Hadoop?
Let us first understand what is Hadoop speculative execution?
In Hadoop, MapReduce breaks jobs into tasks and these tasks run parallel rather than sequential, thus reduces overall execution time. This model of execution is sensitive to slow tasks (even if they are few in numbers) as they slow down the overall execution of a job.
There may be various reasons for the slowdown of tasks, including hardware degradation or software misconfiguration, but it may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. This is called speculative execution in Hadoop. These backup tasks are called Speculative tasks in Hadoop.
3. How Speculative Execution works in Hadoop?
Let us now see Hadoop speculative execution process.
Firstly all the tasks for the job are launched in Hadoop MapReduce. The speculative tasks are launched for those tasks that have been running for some time (at least one minute) and have not made any much progress, on average, as compared with other tasks from the job. The speculative task is killed if the original task completes before the speculative task, on the other hand, the original task is killed if the speculative task finishes before it.
4. Is Speculative Execution Beneficial?
Hadoop MapReduce Speculative execution is beneficial in some cases because in a Hadoop cluster with 100s of nodes, problems like hardware failure or network congestion are common and running parallel or duplicate task would be better since we won’t be waiting for the task in the problem to complete.
But if two duplicate tasks are launched at about same time, it will be a wastage of cluster resources.
5. How to Enable or Disable Speculative Execution?
Speculative execution is a MapReduce job optimization technique in Hadoop that is enabled by default. You can disable speculative execution for mappers and reducers in mapred-site.xml as shown below:
6. What is the need to turn off Speculative Execution?
The main work of speculative execution is to reduce the job execution time; however, the clustering efficiency is affected due to duplicate tasks. Since in speculative execution redundant tasks are being executed, thus this can reduce overall throughput. For this reason, some cluster administrators prefer to turn off the speculative execution in Hadoop.
In conclusion, we can say that Speculative execution is the key feature of Hadoop that improves job efficiency. Hence, it reduces the job execution time. Hope you liked this blog, If you have any question regarding Speculative Execution in Hadoop, so please let us know by leaving a comment in a section given below. We will be glad to solve your queries.