Explain the process of spilling in Hadoop MapReduce?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Explain the process of spilling in Hadoop MapReduce?

Viewing 1 reply thread
  • Author
    Posts
    • #5535
      DataFlair TeamDataFlair Team
      Spectator

      What is the need of spilling in Hadoop?

    • #5537
      DataFlair TeamDataFlair Team
      Spectator

      A spill is when a mapper’s output exceeds the amount of memory which was allocated for the MapReduce task.
      Spilling happens when there is not enough memory to fit all the mapper output. Amount of memory available for this is set by mapreduce.task.io.sort.mb

      Need of Spilling:-
      ->Spilling happens at least once when the mapper finished because the output of the mapper should be sorted and saved to the disk for reducer processes to read it.
      -> A checkpoint is required from which you can restart the reducers jobs. Checkpoint use those spilled records in case of a reduce task failure.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.