Combiner function

Viewing 3 reply threads
  • Author
    Posts
    • #5588
      DataFlair TeamDataFlair Team
      Spectator

      in the combiner function, as told thew data size is reduced ie if it has 10gb it is reduced to like 6 or 7 GB, but it may cause data loss/ from 10 GB t0 7gb?

    • #5590
      DataFlair TeamDataFlair Team
      Spectator

      Combiner does not have anything related to size reduction.
      In simple terms, combiner is an execution stage in MapReduce programming flow, which is carried on the mapper side(i,e @ every data node).

      let’s consider an example, of course, the standard word count program.

      Consider the output of the mapper is as below.

      [word1,(1,1,1,1,…,millionth 1)]—this will be from one mapper
      [word1,(1,1,1,1,1,….1000th 1)]—-this will be from second mapper.
      now when this sent to reducer without the combiner data @ reducer would look like

      [word1,(1,1,1,1…….a lot of entries from all the mappers)]. this will put a lot of burden on the reducers if word1 has appeared a billion times. It has to add billion 1’s to get the final count of word1.

      Role of combiner:
      Combiner sits between mapper and reducer execution stage.
      Combiner now simply sending [word1,(1,1,1…..)] which is mapper output, it will calculate the intermediate sum like [word1, 1000000(count of word1 in at mapper-1)]. and this will be sent as input to the reducer.
      Now the work of the reducer is just to calculate the sum from the output of the combiners.

      This will reduce the network traffic too.

      Follow the link to learn in detail Combiner in Hadoop

    • #5591
      DataFlair TeamDataFlair Team
      Spectator

      Combiner does the work of a reducer for data on that node
      And it is not necessary that combiner does the same work as the reducer. We can define different logic for combiner independent of reducer logic.

      Consider the problem of getting average of different items distributed on different nodes. we can perform the operation of calculating the sum of items and count of items for each node at combiner and the calculating average at the reducer side.]

      Follow the link to learn in detail Combiner in Hadoop

    • #5592
      DataFlair TeamDataFlair Team
      Spectator

      Combiner acts like a mini reducer process which operates only on data generated by one machine. This function mainly used for the purpose of optimization.

      Combiner performs the same aggregation operations as reducer. The main function of a Combiner is to summarize the map output records with the same key. The output of the Combiner is sent to the actual Reducer task as input over the network.

      Follow the link to learn in detail Combiner in Hadoop

Viewing 3 reply threads
  • You must be logged in to reply to this topic.