Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › How to calculate the Hadoop cluster size?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:28 pm #5507DataFlair TeamSpectator
How to perform sizing of a Hadoop cluster?
-
September 20, 2018 at 3:29 pm #5508DataFlair TeamSpectator
Cluster: A cluster in Hadoop is used for distirbuted computing, where it can store and analyze huge amount structured and unstructured data
To setup a cluster we need the below :
1) Client machine: which will make request to read and write the data with the help of name and data node
2) Name Node: Will take care of storing and data with the help of HDFS and parallel computing with the help oof Map reduce
3) Data Node: Will make all the processing/computation of data. Data node will receive instruction from Name node and process them accordingly
4) Resource manager and Task manager: Which resides in Name node and data node for managing resourceThe decision to make to prepare cluster will consider the below points.
1) Ingestion rate: It is the data we can expect on daily basis on an average
2) Replication factor: It will help to create the data copies which can be used when there is a failure a data node.It can be specified in hdfs-site.xml. for multi node cluster replication factor by default is 3 and for single node cluster, it is a 1.Replication factor will occupy the disk space depending on the factor count.It can be modified depending on t he rate we receive the data
3) Size of hard disks: Size of disk which will be installed each data node
4) Buffer memory: Amount of memory kept aside for storing intermediate results of map resultsDaily Ingestion rate 1 TB
Replication Factor 3
Size of Hard Disk 48 (12 * 4 TB)
Buffer memory 25% or 0.25
Memory to be stored in HD 1 * 3 = 3TB
Memory can be used for storing and processing 48-(48*0.25) = 36 TB
Number of Nodes reqd (3*365)/36 =~31 NodesConsidering above factor we can arrive at the size of the cluster.
-
-
AuthorPosts
- You must be logged in to reply to this topic.