How to set the number of mappers to be created?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:32 pm #4868
  
  DataFlair Team
  Spectator
  
  How to calculate the number of mappers in Hadoop?
  How to set no of mappers for a MapReduce job?
  How to change no of mappers in the cluster?
- September 20, 2018 at 12:32 pm #4869
  DataFlair Team
  Spectator
  The number of Mappers that Hadoop creates is determined by the number of Input Splits you have in your Data.
  Relation is simple:
```
No. of Mappers = No. of Input Splits.
```
  So, in order to control the Number of Mappers, you have to first control the Number of Input Splits Hadoop creates before running your MapReduce program. One of the easiest ways to control it is setting the property ‘mapred.max.split.size’ while running your MR program.
  
  Example:
  Let’s assume your Input data is 1 TB. So, number of Physical Data Blocks = (1 * 1024 * 1024 / 128) = 8192 Blocks.
  By Default, if you don’t specify the Split Size, it is equal to the Blocks (i.e.) 8192. Thus, your program will create and execute 8192 Mappers !!!
  
  Let’s say you want to create only 100 Mappers to handle your job.
  As mentioned above, 100 Mappers means 100 Input Splits. So each Split size should be set to (1 * 1024 * 1024 / 100) = 10486 MB
  
  Execute it as follows:
  hadoop jar <your-script.jar> <main class> -Dmapred.max.split.size=10486 <input file> <output directory>
  
  Follow the link to learn more about Mapper in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.