what if a reducer cannot handle the input data coming from mapper

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:55 pm #4954
  
  DataFlair Team
  Spectator
  
  As we all know data with the same key goes the same reducer.
  
  Let’s say we have a scenario where we are using some 20 nodes cluster each of 3 TB disk size.
  
  While proessing the data and after mapper has processed the data, let’s say I have 7TB of data , all of which has the same key and hence needs to go the same reducer.
  
  How will the reducer handle this 7TB of data as we have only 3TB of disk per slave node?
  
  Will this data run on different machines ? Can a single reducer run over multiple slave nodes ?
- September 20, 2018 at 12:56 pm #4955
  
  DataFlair Team
  Spectator
  
  In this case I may provide a composite key which will handle this scenario.
  But ideally speaking if all your data have same key that means that can’t be the a perfect key for the data set.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.