Can we pass output of one reducer as input to another mapper?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Can we pass output of one reducer as input to another mapper?

Viewing 2 reply threads
  • Author
    Posts
    • #4626
      DataFlair TeamDataFlair Team
      Spectator

      I am running two jobs back to back and I am using the output of first job as an input for second. Can I do that without writing the output of first job in HDFS because it may create memory overhead.

    • #4627
      DataFlair TeamDataFlair Team
      Spectator

      Sure it is possible to do, we can pass the output of one reducer to another mapper at the time we execute the application through command line we have to give the correct sequence of input as well as output files, so, when we have multiple mapper and reducer classes, this is exactly we have to do. Although make sure reducer output will be treated as the key-value pair for your mapper, if you are using TextInputFormat to read the file then and here each line offset from the beginning of the file will be key and the entire line will be the value.

    • #4629
      DataFlair TeamDataFlair Team
      Spectator

      Basically, the ChainReducer class permits us to chain multiple Mapper classes after a Reducer in the Reducer task.

      Here, the Mapper classes are invoked in a chained (or piped) fashion, for each record output by the Reducer, that means the output of the first becomes the input of the second, and so on until the last Mapper. Thus, last Mapper’s output will be written to the task’s output.

      Well, the best thing about this function is, the Mappers in the chain need not to worry about that they are executed after the Reducer or in a chain. So, it enables to have reusable specialized Mappers which we can combine to perform composite operations within a single task.

      In addition, to compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*], can use the ChainMapper and the ChainReducer classes. However, the advantage of this pattern is a dramatic reduction in disk IO.

      Furthermore, there is an important thing to note that, we don’t need to specify the output key/value classes for the ChainReducer, because that happens by the setReducer or the addMapper for the last element in the chain only.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.