As we all know data with the same key goes the same reducer.
Let’s say we have a scenario where we are using some 20 nodes cluster each of 3 TB disk size.
While proessing the data and after mapper has processed the data, let’s say I have 7TB of data , all of which has the same key and hence needs to go the same reducer.
How will the reducer handle this 7TB of data as we have only 3TB of disk per slave node?
Will this data run on different machines ? Can a single reducer run over multiple slave nodes ?