What is Speculative execution in Hadoop? Why is it important?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is Speculative execution in Hadoop? Why is it important?

Viewing 3 reply threads
  • Author
    Posts
    • #6296
      DataFlair TeamDataFlair Team
      Spectator

      What is the speculative execution in Apache Hadoop?

    • #6298
      DataFlair TeamDataFlair Team
      Spectator

      If a task is slow, Hadoop doesn’t diagonalise rather tries to fix the slow running task by running another task as back up.This is called speculative execution.
      The duplicate task is for a small portion which is running significantly slower than average eg. cases like Hardware degradation or software mis -configuration. If an original task gets completed the Speculative task will be killed and Vice-versa.

      This is an optimization feature.
      If there is any bug which is causing the slowness, it should be solved rather then leaving it to Speculative execution, as the same bug will affect the Speculative execution also.

      The goal of Speculative execution is reducing the execution time, but not at the cost of cluster efficiency.
      In a Busy Cluster, this will result in reducing the overall throughout.

      Folow the link for more detail: speculative execution in Hadoop

    • #6299
      DataFlair TeamDataFlair Team
      Spectator

      speculative execution is a feature in Hadoop when the hadoop framework starts to clone the “long running” task in another node. As a result of this, the number of tasks is greater than the number of splits.

      Reasons why speculative execution happens:
      1. Unequal sized hardware – Its recommended that all the nodes in the network are equal sized. however, if the hardware is not sized equally causes delays in execution of some of the tasks and the framework kick starts the parallel running of the task.
      2. Network congestion
      3. Faulty hardware

      However, the majority of the times, it’s a false alarm which leads to speculative execution. Also, the majority of the times, the original task is finished which results in the framework canceling the cloned task.
      It’s not beneficial due to the fact that it’s usually the false alarms that lead to speculative execution. It unnecessarily eats up the system resources.

      Folow the link for more detail: speculative execution in Hadoop

    • #6300
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop divides its tasks across many nodes in its cluster. So its possible that few slower nodes(called stragglers) can slow down the rest of the program. The slowness of the nodes could be due to hardware/software failure, network failure,executing non-local task or that node could be busy. In such cases Hadoop platform schedule redundant copies of slower tasks across other nodes in the cluster.This process is known as Speculative Execution in Hadoop. Whichever copy of the task finishes first, it will become the definitive copy and will be used for further processing. If other copies are executing speculatively ,Hadoop will kill those tasks.

      The goal of speculative execution is to reduce job’s response time. But on a busy cluster it may affect the overall throughput of data and wastage of cluster resources. Its also observed that in a heterogeneous cluster too many speculative tasks may be launched which costs resources in the cluster.

      Speculative execution is enabled by default. Its possible to disable speculative execution for mappers and reducers in mapred-site.xml by setting
      mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution to false.

      Folow the link for more detail: speculative execution in Hadoop

Viewing 3 reply threads
  • You must be logged in to reply to this topic.