How many Mappers run on the cluster?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:08 pm #6106
  
  DataFlair Team
  Spectator
  
  How many Mappers run on the cluster?
  What is the ideal no of mappers that should be configured?
  how to change no of mappers on a node of the cluster?
- September 20, 2018 at 5:08 pm #6107
  
  DataFlair Team
  Spectator
  
  Number of mappers depends upon two factors:
  
  (a) The amount of data we want to process along with block size. It is driven by a number of input splits. For 10 TB of data having a block size of 128 MB, we will have 82k mappers.
  
  (b) The configuration of the slave i.e. number of core and RAM available on the slave. The right number of map/node can between 10-100. Usually, 1 to 1.5 cores of processor should be given to each mapper. So for a 15 core processor, 10 mappers can run.
  
  A number of mappers can be controlled by changing the block size. By changing block size the number of input split will increase or decrease. But this is done in the very rare scenario.
  
  Follow the link to learn more about Mappers in Hadoop
- September 20, 2018 at 5:08 pm #6108
  
  DataFlair Team
  Spectator
  
  Hadoop runs 2 mappers and 2 reducers (by default) in a data node, the number of mappers can be changed in the mapreduce.xml configuration file.
  
  The right level of parallelism is 10-100 mappers per node; if the mappers are relatively small, then, may be 300 mappers per node.
  
  The number of mapers depends on the total input size and the divided block size(default 128 Mb) of the data.
  For eg., Input data = 2 Tb (1 Tb = 10^6 Mb), Block size = 100 Mb
  Number of mappers = Input size / block size = (2*10^6)/100 = 20,000 mappers
  
  Follow the link to learn more about Mappers in Hadoop
- September 20, 2018 at 5:08 pm #6110
  
  DataFlair Team
  Spectator
  
  There is no formula to specify the number of mappers should be running in a cluster. It depends on how many cores and how much memory do you have configured in the cluster.
  
  The number of mapper + number of reducer should not exceed the number of cores in general.
  
  So to answer your questions:
  
  Number of mappers = number of input splits(in most cases number of blocks).
  
  Note: Number of blocks depends on file size. If you have 1gb of a file that makes 8 blocks (of 128 MB).
  
  Follow the link to learn more about Mapreduce in Hadoop
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

How many Mappers run on the cluster?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses