How many Mappers run for a MapReduce Job?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How many Mappers run for a MapReduce Job?

Viewing 2 reply threads
  • Author
    Posts
    • #5966
      DataFlair TeamDataFlair Team
      Spectator

      When we submit a MapReduce job how many map tasks run?
      how to calculate number of mappers?
      Can we control no of mappers, how to set no of mappers for a job?

    • #5968
      DataFlair TeamDataFlair Team
      Spectator

      The number of Mappers usually depends on number HDFS blocks (input splits) for the file. Hence, to adjust no. of mappers, HDFS block size can be adjusted(which is generally not recommended). The right level of parallelism for maps seems to be around 10-100 maps/node, although it can be taken up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.

      Number of Mappers also depends on the configuration of the slave i.e. number of core and RAM available on the slave. Usually, 1 to 1.5 cores of processor should be given to each mapper. So for a 15 core processor, 10 mappers can run.

      Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps.
      The default InputFormat behavior is to split the total number of bytes into the right number of fragments.
      However, in the default case the HDFS block size of the input files is treated as an upper bound for input splits.
      A lower bound on the split size can be set via mapred.min.split.size. So, if we expect 10TB of input data and have 128MB HDFS blocks, you’ll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.

      The number of map tasks can also be increased manually using the JobConf’s conf.setNumMapTasks(int num). This we can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.

      Follow the link to learn more about Mappers in Hadoop

    • #5970
      DataFlair TeamDataFlair Team
      Spectator

      The number of map tasks for a given job is driven by the number of input split. For each input split or HDFS blocks a map task is created. So, over the lifetime of a map-reduce job the number of map tasks is equal to the number of input splits.

      Number of mappers can be determined as follows:

      1. Calculate the total size of input files.
      2. The number of mappers = total size calculated / input split size defined in Hadoop configuration.

      Number of mapper and can configured from command line or can be edited in config file as below:

      -D mapred.map.tasks=5 -D mapred.reduce.tasks=2 (5 mapper, 2 reducer)

      OR

      In the code, one can configure JobConf variables.

      job.setNumMapTasks(5); // 5 mappers

      Follow the link to learn more about Mappers in Hadoop

Viewing 2 reply threads
  • You must be logged in to reply to this topic.