Apache HIve

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:31 pm #4865
  
  DataFlair Team
  Spectator
  
  How Replication Works in Hive.Suppose we have a text file in HDFS and we create a table in hive and load that text file in newly created table,then the table will replicate 3 times to three nodes??(if replication factor is 3)
- September 20, 2018 at 12:31 pm #4867
  
  DataFlair Team
  Spectator
  
  Hive Replication
  
  Basically, in order to copy (replicate) our Hive metastore as well as data from one cluster to another and also to keep the Hive metastore and data set on the target cluster synchronized with the source based on a user-specified replication schedule we use Hive replication.
  
  As the number of replicas is based on the replication factor set, in HDFS. Though the replication factor is 3, in your case. so, there will be three copies.
  
  The data is copied only from one location on hdfs to a table in hive, when you do a sqoop import from hdfs to hive(into internal table). But the replication of Hive data again happens based on our replication factor.
  
  Ultimately, as hive doesn’t store data in the same file format), so, in total, you will end up with 3(hdfs) + 1(hive copy)*3 => 3copies on HDFS and 3 copies of data stored by hive(this is not 6 copies.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.