Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › How many Mappers run on the cluster?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:08 pm #6106DataFlair TeamSpectator
How many Mappers run on the cluster?
What is the ideal no of mappers that should be configured?
how to change no of mappers on a node of the cluster? -
September 20, 2018 at 5:08 pm #6107DataFlair TeamSpectator
Number of mappers depends upon two factors:
(a) The amount of data we want to process along with block size. It is driven by a number of input splits. For 10 TB of data having a block size of 128 MB, we will have 82k mappers.
(b) The configuration of the slave i.e. number of core and RAM available on the slave. The right number of map/node can between 10-100. Usually, 1 to 1.5 cores of processor should be given to each mapper. So for a 15 core processor, 10 mappers can run.
A number of mappers can be controlled by changing the block size. By changing block size the number of input split will increase or decrease. But this is done in the very rare scenario.
Follow the link to learn more about Mappers in Hadoop
-
September 20, 2018 at 5:08 pm #6108DataFlair TeamSpectator
Hadoop runs 2 mappers and 2 reducers (by default) in a data node, the number of mappers can be changed in the mapreduce.xml configuration file.
The right level of parallelism is 10-100 mappers per node; if the mappers are relatively small, then, may be 300 mappers per node.
The number of mapers depends on the total input size and the divided block size(default 128 Mb) of the data.
For eg., Input data = 2 Tb (1 Tb = 10^6 Mb), Block size = 100 Mb
Number of mappers = Input size / block size = (2*10^6)/100 = 20,000 mappersFollow the link to learn more about Mappers in Hadoop
-
September 20, 2018 at 5:08 pm #6110DataFlair TeamSpectator
There is no formula to specify the number of mappers should be running in a cluster. It depends on how many cores and how much memory do you have configured in the cluster.
The number of mapper + number of reducer should not exceed the number of cores in general.
So to answer your questions:
Number of mappers = number of input splits(in most cases number of blocks).
Note: Number of blocks depends on file size. If you have 1gb of a file that makes 8 blocks (of 128 MB).
Follow the link to learn more about Mapreduce in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.