Difference between Map task & Mapper ?

Viewing 1 reply thread
  • Author
    Posts
    • #4857
      DataFlair TeamDataFlair Team
      Spectator

      It is said that the number of map tasks is equal to the number of InputSplits .and No. of Mapper= {(total data size)/ (input split size)}. Is there any difference between the two ?

    • #4859
      DataFlair TeamDataFlair Team
      Spectator

      For providing the split’s information, InputFormat class is responsible, in MapReduce. And, the amount of data that goes into one map task, is an input split.

      In addition, for the input path, No. of mapper’s can be calculated by the number of splits, let’s suppose if we are processing on a dir which has 10 files and each file is made up of 10 splits then our job would need 100 mappers to process the data.

      Additionally, there is the case when number of splits = number of mappers, unless Hadoop knows the method to calculate the splits. As an example, when there is a case of compressed file formats such as Gzip thoseare not splittable in that case number of files = number of mappers

Viewing 1 reply thread
  • You must be logged in to reply to this topic.