What is speculative execution in Apache Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is speculative execution in Apache Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5342
      DataFlair TeamDataFlair Team
      Spectator

      What is Hadoop speculative task execution?
      How does Speculative Execution Work in Hadoop MapReduce?
      What is Speculative execution in Hadoop? Why is it important?

    • #5344
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, map-reduce breaks jobs into tasks and these tasks run parallel rather than sequentially which reduces overall execution time. But, if there are any slower tasks they slow down the overall execution time. Hadoop doesn’t diagnose these slow running tasks, instead, it tries to detect them and runs backup tasks for them.This process is called Speculative Execution.

      How does it work?

      Firstly all the tasks for the job are launched in Hadoop Map-reduce. The speculative tasks are launched for those tasks that have been running for some time and doesn’t make any progress on average when compared with the other tasks.The speculative tasks are killed if the original task completes before it, otherwise, if the speculative task gets finished then the original task is killed.

      Now, let us discuss why is it important

      It is beneficial where we have Hadoop cluster with 100’s of nodes because problems like hardware failure or network congestion are common and running parallel or duplicate task would be better since we won’t be waiting for the task in the problem to complete.
      Speculative execution is a reduce job optimization technique id by default enabled.

      Follow the link to learn more about Speculative Execution in Hadoop.

    • #5346
      DataFlair TeamDataFlair Team
      Spectator

       

      In Hadoop, a task is processed by a Map-reduce job. Here, the task is divided into multiple smaller tasks and is distributed across the cluster and run parallely via MapReduce jobs.
      In this parallel processing, there might be a chance for some/few of the tasks in some nodes to be slow compared to other tasks (in the other nodes) of the same job.
      The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed.
      But, Hadoop doesn’t try to fix the slow running tasks, instead clones the slow running tasks to the other nodes, where the rest of the tasks are completed, and gets executed here. This is termed as Speculative Execution. As the name suggests, Hadoop tries to speculate the slow running tasks, and runs the same tasks in the other nodes parallely. Whichever task is completed first,that output is considered for proceeding further, and the slow running tasks are killed.

      Speculative execution is enabled by default. To disable, edit mapred-site.xml conf. file by settingmapreduce.map.speculative mapreduce.reduce.speculative to false.

      To learn in detail please follow: Speculative Execution in Hadoop.

       

Viewing 2 reply threads
  • You must be logged in to reply to this topic.