Tricky Hive Interview Questions and Answers for Experience

After learning questions from Hive Interview Questions Part 1 and Part 2, we found mostly asked Hive Interview Questions and Answer for Experience & Freshers that could possibly be asked by the interviewer.

So, in this blog, we will cover more latest & best Hive Interview Questions Answer for Experience & Freshers those will definitely help you to enhance your Hive knowledge. Thus, let’s Start with Brief Hive Introduction.

Further, we will move to Hive Interview Questions Answer for Experience & Freshers. We request you to visit our Previous blogs on Hive Interview Questions Answer for Experience & Freshers (Part 1 and Part 2) to create the link between all these questions.

However, that will help you to understand following questions well.

Top Hive Interview Questions and Answers for Experience

Que 1. What is Hive?

Ans. Hive is a tool to process structured data in Hadoop. We also call a data warehouse infrastructure. Moreover, to summarize Big Data, it resides on top of Hadoop. Also, makes querying and analyzing easy.

However, the Apache Software Foundation took it up, but initially, Hive was developed by Facebook. Further Apache Software Foundation developed it as an open source under the name Apache Hive. Although, many different companies use it. Like, Amazon uses it in Amazon Elastic MapReduce.

Que 2. How to optimize Hive Performance?

Ans. There are several types of Query Optimization Techniques we can use in Hive in order to Optimize Hive Performance. Such as:

  1. Tez-Execution Engine in Hive
  2. Usage of Suitable File Format in Hive
  3. Hive Partitioning
  4. Bucketing in Hive
  5. Vectorization In Hive
  6. Cost-Based Optimization in Hive (CBO)
  7. Hive Indexing

Que 3. How can client interact with Hive?

Ans. However, there are 3 ways possible in which a  Client can interact with the Hive. Such as:-

i. Hive Thrift Client:
Basically,  with any programming language that supports thrift, we can interact with HIVE.

ii. JDBC Driver:
However, to connect to the HIVE Server the BeeLine CLI uses JDBC Driver.

iii. ODBC Driver:
Also, we can use an ODBC Driver application. Since that support ODBC to connect to the HIVE server.

Que 4. Can we change the data type of a column in a hive table?

Ans. By using REPLACE column option we change the data type of a column in a hive table
ALTER TABLE table_name REPLACE COLUMNS ……

Que 5. How to add the partition in existing table without the partition table?

Ans. Basically, we cannot add/create the partition in the existing table, especially which was not partitioned while creation of the table.

Although, there is one possible way, using “PARTITIONED BY” clause. But the condition is if you had partitioned the existing table, then by using the ALTER TABLE command, you will be allowed to add the partition.

So, here are the create and alter commands:

  • CREATE TABLE tab02 (foo INT, bar STRING) PARTITIONED BY (mon STRING);
  • ALTER TABLE tab02 ADD PARTITION (mon =’10’) location ‘/home/hdadmin/hive-0.13.1-cdh5.3.2/examples/files/kv5.txt’;

Que 6. How Hive organize the data?

Ans. Basically, there are 3 ways possible in which Hive organizes data. Such as:

  1. Tables
  2. Partitions
  3. Buckets

Que 7. Explain Clustering in Hive?

Ans. Basically,  to decompose table data sets into more manageable parts is Clustering in Hive
To be more specific, the table is divided into the number of partitions, and these partitions can be further subdivided into more manageable parts known as Buckets/Clusters.  

In addition, “clustered by” clause is used to divide the table into buckets.

Que 8. Explain bucketing in Hive?

Ans. To decompose table data sets into more manageable parts, Apache Hive offers another technique. That technique is what we call Bucketing in Hive

Que 9. How is HCatalog different from Hive?

Ans. So, let’s learn the difference.
Hcatalog-
Basically, it is a table storage management tool for Hadoop. Basically, that exposes the tabular data of Hive Metastore to other Hadoop applications. Also, it enables users with different data processing tools to easily write data onto a grid.

Moreover, it ensures that users don’t have to worry about where or in what format their data is stored.

Hive-
Whereas, Hive is an open source data warehouse. Also, we use it for analysis and querying datasets. Moreover, it is developed on top of Hadoop as its data warehouse framework for querying and analysis of data is stored in HDFS.

In addition, it is useful for performing several operations. Such as data encapsulation, ad-hoc queries, & analysis of huge datasets. Moreover, for managing and querying structured data Hive’s design reflects its targeted use as a system.

Que 10. What is the difference between CREATE TABLE AND CREATE EXTERNAL TABLE?

Ans. Although, we can create two types of tables in Hive. Such as:
–  Internal Table
–  External Table
Hence, to create the Internal table we use the command ‘CREATE TABLE’ whereas to create the External table we use the command ‘CREATE EXTERNAL TABLE’.

Hive Interview Questions and Answers for Freshers – Q. 1,2,4,6,7,8,9,10
Hive Interview Questions and Answers for Experience – Q. 3,5

Que 11. Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Ans. There is a possibility that because of  following reasons above error may occur:

  1. While we use derby metastore, Then lock file would be there in case of the abnormal exit.

Hence, do remove the lock file
rm metastore_db/*.lck

  1. Moreover, Run hive in Debug mode

hive -hiveconf hive.root.logger=DEBUG,console

Que 12. How many types of Tables in Hive?

Ans. Hive has two types of tables. Such as:

  • Managed table
  • External table

Que 13. Explain Hive Thrift server?

Ans. There is an optional component in Hive that we call as HiveServer or HiveThrift. Basically, that allows access to Hive over a single port. However, for scalable cross-language services development Thrift is a software framework.

Also, it allows clients using languages including Java, C++, Ruby, and many others, to programmatically access Hive remotely.

Que 14. How to Write a UDF function in Hive?

Ans. Basically, following are the steps:

  1. Create a Java class for the User Defined Function which extends ora.apache.hadoop.hive.sq.exec.UDF and implements more than one evaluate() methods. Put in your desired logic and you are almost there.
  2. Package your Java class into a JAR file
  3. Go to Hive CLI, add your JAR, and verify your JARs is in the Hive CLI classpath
  4. CREATE TEMPORARY FUNCTION in Hive which points to your Java class
  5. Then Use it in Hive SQL.

Que 15. Differentiate between PigLatin and Hive

Ans.  Let’s see feature wise difference between them:

1. Language Used

Apache Hive

In Hive, there is a declarative language called HiveQL which is like SQL.

Apache Pig

In Pig, there is a procedural language called Pig Latin.

2. Mainly Used for

Apache Hive

Mainly, data analysts use Apache Hive.

Apache Pig

Mainly, researchers and programmers use Apache Pig.

3. Data

Apache Hive

Basically, Hive allows structured data.

Apache Pig

However, Apache Pig allows both structured and semi-structured data.

4. Operates on

Apache Hive

Basically, Hive component operates on a server side of the cluster.

Apache Pig

However, Pig server operates on the client side of the cluster.

Que 16. What is the difference between Internal Table and External Table in Hive?

Ans. Hive Managed Tables-
It is also known an internal table. When we create a table in Hive, it by default manages the data. This means that Hive moves the data into its warehouse directory.
Usage:

  • We want Hive to completely manage the lifecycle of the data and table.
  • Data is temporary

Hive External Tables-
We can also create an external table. It tells Hive to refer to the data that is at an existing location outside the warehouse directory.
Usage:

  • Data is used outside of Hive. For example, the data files are read and processed by an existing program that does not lock the files.
  • We are not creating a table based on the existing table.

Que 17. Difference between order by and sort by in Hive?

Ans. So, the difference is:

  • Sort by

hive> SELECT  E.EMP_ID FROM Employee E SORT BY E.empid;  

  1.  for final output, it may use multiple reducers.
  2. within a reducer only guarantees to order of rows.

iii. it gives partially ordered result.

  • Order by

hive> SELECT  E.EMP_ID FROM Employee E order BY E.empid;  

  1.  Basically, to guarantee the total order in output Uses single reducer.
  2. Also, to minimize sort time LIMIT can be used.

Que 18. What are different modes of metastore deployment in Hive?

Ans. There are three modes for metastore deployment which Hive offers.

1. Embedded metastore

Here, by using embedded Derby Database both metastore service and hive service runs in the same JVM.

2. Local Metastore

However, here, Hive metastore service runs in the same process as the main Hive Server process, but the metastore database runs in a separate process.

3. Remote Metastore

Here, metastore runs on its own separate JVM, not in the Hive service JVM.

Que 19. Difference between HBase vs Hive

Ans. Following points are feature wise comparison of HBase vs Hive.

1.Database type

Apache Hive

Basically, Apache Hive is not a database.

HBase

HBase does support NoSQL database.

2. Type of processing

Apache Hive

Hive does support Batch processing. That is OLAP.

HBase

HBase does support real-time data streaming. That is OLTP.

3. Data Schema

Apache Hive

Basically, it supports to have schema model

HBase

However, it is schema-free

Que 20. What is the relation between MapReduce and Hive?

Ans. Hive offers no capabilities to MapReduce. The Programs are executed as MapReduce Job via the interpreter. Then Interpreter runs on the Client machine. Afterward, that runs HiveQL Queries to MR jobs. Furthermore, Framework submits jobs to the cluster.

Hive Interview Questions and Answers for Freshers – Q. 12,13,14,15,16,19,20

Hive Interview Questions and Answers for Experience – Q. 11,17,18

Que 21. What is the importance of driver in Hive?

Ans. Driver manages the life cycle of Hive QL Queries. It receives the queries from UI and fetches on JDBC interfaces to process the query. Also, it creates a separate section to handle the query.

Que 22. How can you configure remote metastore mode in Hive?

Ans. To use this remote metastore, you should configure Hive service by setting hive.metastore.uris to the metastore server URI(s). Metastore server URIs are of the form thrift://host:port, where the port corresponds to the one set by METASTORE_PORT when starting the metastore server.

Que 23. Can we LOAD data into a view?

Ans. No.

Que 24. What types of costs are associated with creating the index on hive tables?

Ans. Basically, there is a processing cost in arranging the values of the column on which index is created since Indexes occupies.

Que 25. Give the command to see the indexes on a table.

Ans. SHOW INDEX ON table_name
Basically, in the table table_name, this will list all the indexes created on any of the columns.

Que 26. How do you specify the table creator name when creating a table in Hive?

Ans. The TBLPROPERTIES clause is used to add the creator name while creating a table.
The TBLPROPERTIES is added like −
TBLPROPERTIES(‘creator’= ‘Joan’)

Que 27. Difference between Hive and Impala?

Ans. Following are the feature wise comparison between Impala vs Hive:

1. Query Process

  • Hive

Basically,  in Hive every query has the common problem of a “cold start”.

  • Impala

Impala avoids any possible startup overheads, being a native query language. However, that are very frequently and commonly observed in MapReduce based jobs. Moreover, to process a query always Impala daemon processes are started at the boot time itself, making it ready.`

2. Intermediate Results

  • Hive

Basically, Hive materializes all intermediate results. Hence, it enables enabling better scalability and fault tolerance. However, that has an adverse effect on slowing down the data processing.

  • Impala

However, it’s streaming intermediate results between executors. Although, that trades off scalability as such.

3. During the Runtime                    

  • Hive

At Compile time, Hive generates query expressions.

  • Impala

During the Runtime, Impala generates code for “big loops”.

Que 28. What are types of Hive Built-In Functions?

Ans. So, its types are:

  1. Collection Functions
  2. Hive Date Functions
  3. Mathematical Functions
  4. Conditional Functions
  5. Hive String Functions

Que 29. Types of Hive DDL Commands.

Ans. However, there are several types of Hive DDL commands, we commonly use. such as:

  1. Create Database Statement
  2. Hive Show Database
  3. Drop database
  4. Creating Hive Tables
  5. Browse the table
  6. Altering and Dropping Tables
  7. Hive Select Data from Table
  8. Hive Load Data

Que 30. What are Hive Operators and its Types?

Ans. Hive operators are used for mathematical operations on operands. Also, it returns specific value as per the logic applied.

  • Types of Hive Built-in Operators
  • Relational Operators
  • Arithmetic Operators
  • Logical Operators
  • String Operators
  • Operators on Complex Types

Hive Interview Questions and Answers for Freshers – Q. 21,23,24,25,26,27,28,29,30

Hive Interview Questions and Answers for Experience – Q. 22

Conclusion

As a result, we have seen all possible Apache Hive Interview Questions and Answers for Experienced & Freshers. Furthermore, if any doubt regarding Hive Interview Questions and Answers for Experience, feel free to ask through the comment section.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *