Is YARN a replacement of Hadoop MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Is YARN a replacement of Hadoop MapReduce?

Viewing 2 reply threads
  • Author
    Posts
    • #6188
      DataFlair TeamDataFlair Team
      Spectator

      Is YARN a replacement of MapReduce in Hadoop?

    • #6189
      DataFlair TeamDataFlair Team
      Spectator

      No, Yarn is the not the replacement of MR.
      In Hadoop v1 there were two components hdfs and MR. MR had two components for job completion cycle.
      1. JobTracker: schedules the job and monitors the job for failure, slowness etc
      2. TaskTracker: Runs the job on an individual node and sends the status to JobTracker.

      Hadoop 2
      There are 3 components(hdfs, YARN, MR)
      In Hadoop 2.0 job scheduling and monitoring part is abstracted to YARN from MR. YARN has 2 components for scheduling and monitoring of jobs.
      1. Resource manager: Keeps track of scheduling part
      2. Application manager: Keeps track of monitoring part.

      MR will do its job after job scheduling.

    • #6190
      DataFlair TeamDataFlair Team
      Spectator

      NO, Yarn is not the replacement of mapreduce
      MapReduce and YARN definitely different. MapReduce is Programming Model, YARN is architecture for distribution cluster. Hadoop 2 using YARN for resource management. Besides that, hadoop support programming model which support parallel processing that we known as MapReduce. Before hadoop 2, hadoop already support MapReduce. In short, MapReduce run above YARN Architecture. Sorry, i don’t mention in part of straggler problem.

      “when MRmaster asks resource manger for resources?” when user submit MapReduce Job. After MapReduce job has done, resource will be back to free.

      “resource manger will give MRmaster all resources it needs or it is according to cluster computing capabilities” I don’t get this question point. Obviously, the resources manager will give all resource it needs no matter what cluster computing capabilities. Cluster computing capabilities will influence on processing time.”

      and

      MRv1 uses the JobTracker to create and assign tasks to data nodes, which can become a resource bottleneck when the cluster scales out far enough (usually around 4,000 clusters).

      MRv2MRv2 (aka YARN, “Yet Another Resource Negotiator”) has a Resource Manager for each cluster, and each data node runs a Node Manager. For each job, one slave node will act as the Application Master, monitoring resources/tasks, etc.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.