Apache HIve

Viewing 1 reply thread
  • Author
    Posts
    • #4865
      DataFlair TeamDataFlair Team
      Spectator

      How Replication Works in Hive.Suppose we have a text file in HDFS and we create a table in hive and load that text file in newly created table,then the table will replicate 3 times to three nodes??(if replication factor is 3)

    • #4867
      DataFlair TeamDataFlair Team
      Spectator

      Hive Replication

      Basically, in order to copy (replicate) our Hive metastore as well as data from one cluster to another and also to keep the Hive metastore and data set on the target cluster synchronized with the source based on a user-specified replication schedule we use Hive replication.

      As the number of replicas is based on the replication factor set, in HDFS. Though the replication factor is 3, in your case. so, there will be three copies.

      The data is copied only from one location on hdfs to a table in hive, when you do a sqoop import from hdfs to hive(into internal table). But the replication of Hive data again happens based on our replication factor.

      Ultimately, as hive doesn’t store data in the same file format), so, in total, you will end up with 3(hdfs) + 1(hive copy)*3 => 3copies on HDFS and 3 copies of data stored by hive(this is not 6 copies.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.