How to calculate the Hadoop cluster size?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to calculate the Hadoop cluster size?

Viewing 1 reply thread
  • Author
    Posts
    • #5507
      DataFlair TeamDataFlair Team
      Spectator

      How to perform sizing of a Hadoop cluster?

    • #5508
      DataFlair TeamDataFlair Team
      Spectator

      Cluster: A cluster in Hadoop is used for distirbuted computing, where it can store and analyze huge amount structured and unstructured data

      To setup a cluster we need the below :
      1) Client machine: which will make request to read and write the data with the help of name and data node
      2) Name Node: Will take care of storing and data with the help of HDFS and parallel computing with the help oof Map reduce
      3) Data Node: Will make all the processing/computation of data. Data node will receive instruction from Name node and process them accordingly
      4) Resource manager and Task manager: Which resides in Name node and data node for managing resource

      The decision to make to prepare cluster will consider the below points.
      1) Ingestion rate: It is the data we can expect on daily basis on an average
      2) Replication factor: It will help to create the data copies which can be used when there is a failure a data node.It can be specified in hdfs-site.xml. for multi node cluster replication factor by default is 3 and for single node cluster, it is a 1.Replication factor will occupy the disk space depending on the factor count.It can be modified depending on t he rate we receive the data
      3) Size of hard disks: Size of disk which will be installed each data node
      4) Buffer memory: Amount of memory kept aside for storing intermediate results of map results

      Daily Ingestion rate 1 TB
      Replication Factor 3
      Size of Hard Disk 48 (12 * 4 TB)
      Buffer memory 25% or 0.25
      Memory to be stored in HD 1 * 3 = 3TB
      Memory can be used for storing and processing 48-(48*0.25) = 36 TB
      Number of Nodes reqd (3*365)/36 =~31 Nodes

      Considering above factor we can arrive at the size of the cluster.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.