How many Mappers run for a MapReduce Job?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:51 pm #5966
  
  DataFlair Team
  Spectator
  
  When we submit a MapReduce job how many map tasks run?
  how to calculate number of mappers?
  Can we control no of mappers, how to set no of mappers for a job?
- September 20, 2018 at 4:51 pm #5968
  
  DataFlair Team
  Spectator
  
  The number of Mappers usually depends on number HDFS blocks (input splits) for the file. Hence, to adjust no. of mappers, HDFS block size can be adjusted(which is generally not recommended). The right level of parallelism for maps seems to be around 10-100 maps/node, although it can be taken up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
  
  Number of Mappers also depends on the configuration of the slave i.e. number of core and RAM available on the slave. Usually, 1 to 1.5 cores of processor should be given to each mapper. So for a 15 core processor, 10 mappers can run.
  
  Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps.
  The default InputFormat behavior is to split the total number of bytes into the right number of fragments.
  However, in the default case the HDFS block size of the input files is treated as an upper bound for input splits.
  A lower bound on the split size can be set via mapred.min.split.size. So, if we expect 10TB of input data and have 128MB HDFS blocks, you’ll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.
  
  The number of map tasks can also be increased manually using the JobConf’s conf.setNumMapTasks(int num). This we can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.
  
  Follow the link to learn more about Mappers in Hadoop
- September 20, 2018 at 4:51 pm #5970
  
  DataFlair Team
  Spectator
  
  The number of map tasks for a given job is driven by the number of input split. For each input split or HDFS blocks a map task is created. So, over the lifetime of a map-reduce job the number of map tasks is equal to the number of input splits.
  
  Number of mappers can be determined as follows:
  
  1. Calculate the total size of input files.
  2. The number of mappers = total size calculated / input split size defined in Hadoop configuration.
  
  Number of mapper and can configured from command line or can be edited in config file as below:
  
  -D mapred.map.tasks=5 -D mapred.reduce.tasks=2 (5 mapper, 2 reducer)
  
  OR
  
  In the code, one can configure JobConf variables.
  
  job.setNumMapTasks(5); // 5 mappers
  
  Follow the link to learn more about Mappers in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

How many Mappers run for a MapReduce Job?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses