Hadoop

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:31 pm #5519
  
  DataFlair Team
  Spectator
  
  Are physical blocks stored somewhere on HDFS,in case while our 1st Block is being written to node 1 and in the mid situation the node fails,then the data is lost or we have this input split stored at some other location,as block replicas are not yet created on other datanodes?
- September 20, 2018 at 3:31 pm #5521
  DataFlair Team
  Spectator
  The basic rule of HDFS blocks are that it stores 3 copies of the block across different nodes. Essentially a stream of data is piped into the HDFS write API. Each 128MB a new block is created internally. Inside each block the buffer sends data when a network package is full (64KB or so)
  
  For example, if a 1GB file is written into HDFS API, the following happens:
  1. Block1 is created on (ideally local) node1, copy on node2(different rack) and node3(same rack as node2)
  2. Data is streamed into it, in 64KB chunks, from client to node1. Whenever the Datanode receives 64KB chunk it writes it into the block and tells client that write was successful and at the same time sends a copy to node2
  3. node2 writes chunk to its replica of the block (Block2) and sends data to node3
  4. node3 writes chunk to the replicated block (Block3) on disc
  5. next 64kb chunk is send from client to node1. Once the block size of 128MB is full, the next block is created.
  The write is successful once the client receives notification from node1, that it successfully wrote the last block.
  
  If node1 dies during the write, the client will rewrite blocks on a different node. Hence, this ensures that there is no data loss.
- September 20, 2018 at 3:32 pm #5522
  
  DataFlair Team
  Spectator
  
  By default the replica of the blocks are created whenever the network package is full(64KB), so there is no loss of data.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses