On what basis name node distribute blocks across the data nodes?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:15 pm #5100
  
  DataFlair Team
  Spectator
  
  Explain data block placement policies. On what factors blocks are distributed in hdfs
- September 20, 2018 at 2:15 pm #5102
  DataFlair Team
  Spectator
  The strategy by which Hadoop distributes Data Blocksacross clusters is based on trade offs between data reliability, write bandwidth and read bandwidth.
  - The basic placement policy tries to place the first block on the client (in case client is in a different cluster, random datanodes are chosen other than busy/fully loaded ones)
  - The second replica is placed on a different rack than the first one (also known as off rack)
  - The third replica is placed on the same rack as the second one but in a different datanode.
  Further replicas are placed in random nodes avoiding same rack placement.
  
  Apart from the basic policy, hadoop also has a Balancer daemon which distributes blocks by moving them from over-utilized datanodes to under-utilized datanodes keeping the basic placement policy in mind.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.