How to change / configure number of mappers?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to change / configure number of mappers?

Viewing 2 reply threads
  • Author
    Posts
    • #5775
      DataFlair TeamDataFlair Team
      Spectator

      How to change no of mappers in the cluster?
      which config file / param need to be changed?

    • #5777
      DataFlair TeamDataFlair Team
      Spectator

      The number of map tasks for a given job is driven by the number of input splits. So, the number of map tasks is equal to the number of input splits.

      Split is logical split of the data, basically used during data processing using MapReduce program

      Split size is user defined and user can choose the split size based on the data size mapred.map.tasks is just a hint to the InputFormat for the number of maps.

      Suppose you have a file of 200MB and HDFS default block configuration is 128MB.Then it will consider two splits.
      But if you have specified the split size(say 200MB) in your MapReduce program then both blocks(2 block) will be considered as a single split for the MapReduce processing and one Mapper will get assigned for this job.

      Number of map task depends on File size, If you want n number of Map, divide the file size by n as follows:
      Parameters:
      conf.set(“mapred.max.split.size”, “41943040”); // maximum split file size in bytes

      conf.set(“mapred.min.split.size”, “20971520”); // minimum split file size in bytes

    • #5779
      DataFlair TeamDataFlair Team
      Spectator

      Number of mappers always equals to the Number of input splits. We can control the number of splits by changing the mapred.min.split.size which controls the minimum input split size.

      Assume the block size is 64 MB and mapred.min.split.size is set to 128 MB.
      The size of InputSplit will be 128 MB even though the block size is 64 MB.

      It is nor recommended to have the split size to be greater than the block size. Doing so will decrease the number of mappers but at the expense of sacrificing data locality because now an InputSplit will comprise data from atleast two blocks and both the blocks may not be available on the same DataNode.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.