Explain the process of spilling in Hadoop MapReduce?
-
-
What is the need of spilling in Hadoop?
-
A spill is when a mapper’s output exceeds the amount of memory which was allocated for the MapReduce task.
Spilling happens when there is not enough memory to fit all the mapper output. Amount of memory available for this is set by mapreduce.task.io.sort.mb
Need of Spilling:-
->Spilling happens at least once when the mapper finished because the output of the mapper should be sorted and saved to the disk for reducer processes to read it.
-> A checkpoint is required from which you can restart the reducers jobs. Checkpoint use those spilled records in case of a reduce task failure.
- You must be logged in to reply to this topic.