What is Speculative execution in Hadoop? Why is it important?

This topic has 3 replies, 1 voice, and was last updated 7 years, 8 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:39 pm #6296
  
  DataFlair Team
  Spectator
  
  What is the speculative execution in Apache Hadoop?
- September 20, 2018 at 5:39 pm #6298
  
  DataFlair Team
  Spectator
  
  If a task is slow, Hadoop doesn’t diagonalise rather tries to fix the slow running task by running another task as back up.This is called speculative execution.
  The duplicate task is for a small portion which is running significantly slower than average eg. cases like Hardware degradation or software mis -configuration. If an original task gets completed the Speculative task will be killed and Vice-versa.
  
  This is an optimization feature.
  If there is any bug which is causing the slowness, it should be solved rather then leaving it to Speculative execution, as the same bug will affect the Speculative execution also.
  
  The goal of Speculative execution is reducing the execution time, but not at the cost of cluster efficiency.
  In a Busy Cluster, this will result in reducing the overall throughout.
  
  Folow the link for more detail: speculative execution in Hadoop
- September 20, 2018 at 5:40 pm #6299
  
  DataFlair Team
  Spectator
  
  speculative execution is a feature in Hadoop when the hadoop framework starts to clone the “long running” task in another node. As a result of this, the number of tasks is greater than the number of splits.
  
  Reasons why speculative execution happens:
  1. Unequal sized hardware – Its recommended that all the nodes in the network are equal sized. however, if the hardware is not sized equally causes delays in execution of some of the tasks and the framework kick starts the parallel running of the task.
  2. Network congestion
  3. Faulty hardware
  
  However, the majority of the times, it’s a false alarm which leads to speculative execution. Also, the majority of the times, the original task is finished which results in the framework canceling the cloned task.
  It’s not beneficial due to the fact that it’s usually the false alarms that lead to speculative execution. It unnecessarily eats up the system resources.
  
  Folow the link for more detail: speculative execution in Hadoop
- September 20, 2018 at 5:40 pm #6300
  
  DataFlair Team
  Spectator
  
  Hadoop divides its tasks across many nodes in its cluster. So its possible that few slower nodes(called stragglers) can slow down the rest of the program. The slowness of the nodes could be due to hardware/software failure, network failure,executing non-local task or that node could be busy. In such cases Hadoop platform schedule redundant copies of slower tasks across other nodes in the cluster.This process is known as Speculative Execution in Hadoop. Whichever copy of the task finishes first, it will become the definitive copy and will be used for further processing. If other copies are executing speculatively ,Hadoop will kill those tasks.
  
  The goal of speculative execution is to reduce job’s response time. But on a busy cluster it may affect the overall throughput of data and wastage of cluster resources. Its also observed that in a heterogeneous cluster too many speculative tasks may be launched which costs resources in the cluster.
  
  Speculative execution is enabled by default. Its possible to disable speculative execution for mappers and reducers in mapred-site.xml by setting
  mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution to false.
  
  Folow the link for more detail: speculative execution in Hadoop
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is Speculative execution in Hadoop? Why is it important?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses