Hadoop

Viewing 2 reply threads
  • Author
    Posts
    • #5519
      DataFlair TeamDataFlair Team
      Spectator

      Are physical blocks stored somewhere on HDFS,in case while our 1st Block is being written to node 1 and in the mid situation the node fails,then the data is lost or we have this input split stored at some other location,as block replicas are not yet created on other datanodes?

    • #5521
      DataFlair TeamDataFlair Team
      Spectator

      The basic rule of HDFS blocks are that it stores 3 copies of the block across different nodes. Essentially a stream of data is piped into the HDFS write API. Each 128MB a new block is created internally. Inside each block the buffer sends data when a network package is full (64KB or so)

      For example, if a 1GB file is written into HDFS API, the following happens:

      1. Block1 is created on (ideally local) node1, copy on node2(different rack) and node3(same rack as node2)
      2. Data is streamed into it, in 64KB chunks, from client to node1. Whenever the Datanode receives 64KB chunk it writes it into the block and tells client that write was successful and at the same time sends a copy to node2
      3. node2 writes chunk to its replica of the block (Block2) and sends data to node3
      4. node3 writes chunk to the replicated block (Block3) on disc
      5. next 64kb chunk is send from client to node1. Once the block size of 128MB is full, the next block is created.

      The write is successful once the client receives notification from node1, that it successfully wrote the last block.

      If node1 dies during the write, the client will rewrite blocks on a different node. Hence, this ensures that there is no data loss.

    • #5522
      DataFlair TeamDataFlair Team
      Spectator

      By default the replica of the blocks are created whenever the network package is full(64KB), so there is no loss of data.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.