Difference between Map task & Mapper ?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:29 pm #4857
  
  DataFlair Team
  Spectator
  
  It is said that the number of map tasks is equal to the number of InputSplits .and No. of Mapper= {(total data size)/ (input split size)}. Is there any difference between the two ?
- September 20, 2018 at 12:29 pm #4859
  
  DataFlair Team
  Spectator
  
  For providing the split’s information, InputFormat class is responsible, in MapReduce. And, the amount of data that goes into one map task, is an input split.
  
  In addition, for the input path, No. of mapper’s can be calculated by the number of splits, let’s suppose if we are processing on a dir which has 10 files and each file is made up of 10 splits then our job would need 100 mappers to process the data.
  
  Additionally, there is the case when number of splits = number of mappers, unless Hadoop knows the method to calculate the splits. As an example, when there is a case of compressed file formats such as Gzip thoseare not splittable in that case number of files = number of mappers
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.