Explain the process of spilling in Hadoop MapReduce?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 3:34 pm #5535
  
  DataFlair Team
  Spectator
  
  What is the need of spilling in Hadoop?
- September 20, 2018 at 3:34 pm #5537
  
  DataFlair Team
  Spectator
  
  A spill is when a mapper’s output exceeds the amount of memory which was allocated for the MapReduce task.
  Spilling happens when there is not enough memory to fit all the mapper output. Amount of memory available for this is set by mapreduce.task.io.sort.mb
  
  Need of Spilling:-
  ->Spilling happens at least once when the mapper finished because the output of the mapper should be sorted and saved to the disk for reducer processes to read it.
  -> A checkpoint is required from which you can restart the reducers jobs. Checkpoint use those spilled records in case of a reduce task failure.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.