Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop In MapReduce where sorting is done on mapper node or reducer node

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #4732

    dfbdteam3
    Moderator

    During the execution of hadoop mapreduce job data is sorted according to the keys. Where sorting of data is done on map or on reduce node ?

    #4733

    dfbdteam3
    Moderator

    The role of Mapper is to execute the business logic and produce the Key/Value pairs which are passed to Partitioner. The Partitioner sorts the mapper output Key/Value pairs.

    The Partitioner instance runs in the SAME JVM as of Mapper and hence the same node. This implies that that Mapper phase sorting happens on the Mapper node itself.

    However, there is an optional sorting that may happen during Reducer phase as well. Sometimes we want to have the sorted output by value than keys. This sorting is achieved by a technique called as Secondary Sorting. The Secondary sorting happens on Reducer node.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.