How many Mappers run on the cluster?

Viewing 3 reply threads
  • Author
    Posts
    • #6106
      DataFlair TeamDataFlair Team
      Spectator

      How many Mappers run on the cluster?
      What is the ideal no of mappers that should be configured?
      how to change no of mappers on a node of the cluster?

    • #6107
      DataFlair TeamDataFlair Team
      Spectator

      Number of mappers depends upon two factors:

      (a) The amount of data we want to process along with block size. It is driven by a number of input splits. For 10 TB of data having a block size of 128 MB, we will have 82k mappers.

      (b) The configuration of the slave i.e. number of core and RAM available on the slave. The right number of map/node can between 10-100. Usually, 1 to 1.5 cores of processor should be given to each mapper. So for a 15 core processor, 10 mappers can run.

      A number of mappers can be controlled by changing the block size. By changing block size the number of input split will increase or decrease. But this is done in the very rare scenario.

      Follow the link to learn more about Mappers in Hadoop

    • #6108
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop runs 2 mappers and 2 reducers (by default) in a data node, the number of mappers can be changed in the mapreduce.xml configuration file.

      The right level of parallelism is 10-100 mappers per node; if the mappers are relatively small, then, may be 300 mappers per node.

      The number of mapers depends on the total input size and the divided block size(default 128 Mb) of the data.
      For eg., Input data = 2 Tb (1 Tb = 10^6 Mb), Block size = 100 Mb
      Number of mappers = Input size / block size = (2*10^6)/100 = 20,000 mappers

      Follow the link to learn more about Mappers in Hadoop

    • #6110
      DataFlair TeamDataFlair Team
      Spectator

      There is no formula to specify the number of mappers should be running in a cluster. It depends on how many cores and how much memory do you have configured in the cluster.

      The number of mapper + number of reducer should not exceed the number of cores in general.

      So to answer your questions:

      Number of mappers = number of input splits(in most cases number of blocks).

      Note: Number of blocks depends on file size. If you have 1gb of a file that makes 8 blocks (of 128 MB).

      Follow the link to learn more about Mapreduce in Hadoop

Viewing 3 reply threads
  • You must be logged in to reply to this topic.