Explain Clustering in Hive?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:08 pm #4752
  
  DataFlair Team
  Spectator
  
  What is Clustering in Hive?
- September 20, 2018 at 12:08 pm #4754
  
  DataFlair Team
  Spectator
  
  In order to decompose table data sets into more manageable parts, Bucketing and Clustering is the process in Hive.
  
  Basically, the concept of bucketing is based on HashFunction(Bucketing column) mod No.of Buckets. Moreover, by this HashFunction, the bucket number is found. And, while creating a bucket table, no. of buckets is mentioned.
  
  In addition, the table is divided into the number of partitions, and further these partitions are subdivided into more manageable parts which we call Buckets/Clusters.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.