Tricky Hive Interview Questions and Answers for Experience
1. Hive Interview Questions and Answers for Experience
After learning questions from Hive Interview Questions Part 1 and Part 2, we found mostly asked Hive Interview Questions and Answer for Experience & Freshers that could possibly be asked by the interviewer. So, in this blog, we will cover more latest & best Hive Interview Questions Answer for Experience & Freshers those will definitely help you to enhance your Hive knowledge. Thus, let’s Start with Brief Hive Introduction. Further, we will move to Hive Interview Questions Answer for Experience & Freshers. We request you to visit our Previous blogs on Hive Interview Questions Answer for Experience & Freshers (Part 1 and Part 2) to create the link between all these questions. However, that will help you to understand following questions well.
2. Top Hive Interview Questions and Answers for Experience
Que 1. What is Hive?
Ans. Hive is a tool to process structured data in Hadoop. We also call a data warehouse infrastructure. Moreover, to summarize Big Data, it resides on top of Hadoop. Also, makes querying and analyzing easy.
However, the Apache Software Foundation took it up, but initially, Hive was developed by Facebook. Further Apache Software Foundation developed it as an open source under the name Apache Hive. Although, many different companies use it. Like, Amazon uses it in Amazon Elastic MapReduce.
Learn Hive in detail, follow the link; Apache Hive Tutorial – A Single Best Comprehensive Guide
Que 2. How to optimize Hive Performance?
Ans. There are several types of Query Optimization Techniques we can use in Hive in order to Optimize Hive Performance. Such as:
- Tez-Execution Engine in Hive
- Usage of Suitable File Format in Hive
- Hive Partitioning
- Bucketing in Hive
- Vectorization In Hive
- Cost-Based Optimization in Hive (CBO)
- Hive Indexing
Follow the link to learn all Optimization Techniques in detail: Hive – Optimization Techniques
Que 3. How can client interact with Hive?
Ans. However, there are 3 ways possible in which a Client can interact with the Hive. Such as:-
i. Hive Thrift Client:
Basically, with any programming language that supports thrift, we can interact with HIVE.
ii. JDBC Driver:
However, to connect to the HIVE Server the BeeLine CLI uses JDBC Driver.
iii. ODBC Driver:
Also, we can use an ODBC Driver application. Since that support ODBC to connect to the HIVE server.
Que 4. Can we change the data type of a column in a hive table?
Ans. By using REPLACE column option we change the data type of a column in a hive table
ALTER TABLE table_name REPLACE COLUMNS ……
Que 5. How to add the partition in existing table without the partition table?
Ans. Basically, we cannot add/create the partition in the existing table, especially which was not partitioned while creation of the table.
Although, there is one possible way, using “PARTITIONED BY” clause. But the condition is if you had partitioned the existing table, then by using the ALTER TABLE command, you will be allowed to add the partition.
So, here are the create and alter commands:
- CREATE TABLE tab02 (foo INT, bar STRING) PARTITIONED BY (mon STRING);
- ALTER TABLE tab02 ADD PARTITION (mon =’10’) location ‘/home/hdadmin/hive-0.13.1-cdh5.3.2/examples/files/kv5.txt’;
Que 6. How Hive organize the data?
Ans. Basically, there are 3 ways possible in which Hive organizes data. Such as:
Que 7. Explain Clustering in Hive?
Ans. Basically, to decompose table data sets into more manageable parts is Clustering in Hive
To be more specific, the table is divided into the number of partitions, and these partitions can be further subdivided into more manageable parts known as Buckets/Clusters. In addition, “clustered by” clause is used to divide the table into buckets.
Que 8. Explain bucketing in Hive?
Ans. To decompose table data sets into more manageable parts, Apache Hive offers another technique. That technique is what we call Bucketing in Hive.
Follow this link to know more about Hive Bucketing in detail – Hive – Bucketing
Que 9. How is HCatalog different from Hive?
Ans. So, let’s learn the difference.
Basically, it is a table storage management tool for Hadoop. Basically, that exposes the tabular data of Hive Metastore to other Hadoop applications. Also, it enables users with different data processing tools to easily write data onto a grid. Moreover, it ensures that users don’t have to worry about where or in what format their data is stored.
Whereas, Hive is an open source data warehouse. Also, we use it for analysis and querying datasets. Moreover, it is developed on top of Hadoop as its data warehouse framework for querying and analysis of data is stored in HDFS.
In addition, it is useful for performing several operations. Such as data encapsulation, ad-hoc queries, & analysis of huge datasets. Moreover, for managing and querying structured data Hive’s design reflects its targeted use as a system.
Read more about Apache Hive in detail – Hive – Tutorial
Que 10. What is the difference between CREATE TABLE AND CREATE EXTERNAL TABLE?
Ans. Although, we can create two types of tables in Hive. Such as:
– Internal Table
– External Table
Hence, to create the Internal table we use the command ‘CREATE TABLE’ whereas to create the External table we use the command ‘CREATE EXTERNAL TABLE’.
Hive Interview Questions and Answers for Freshers – Q. 1,2,4,6,7,8,9,10
Hive Interview Questions and Answers for Experience – Q. 3,5
Que 11. Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Ans. There is a possibility that because of following reasons above error may occur:
- While we use derby metastore, Then lock file would be there in case of the abnormal exit.
Hence, do remove the lock file
- Moreover, Run hive in Debug mode
hive -hiveconf hive.root.logger=DEBUG,console
Que 12. How many types of Tables in Hive?
Ans. Hive has two types of tables. Such as:
- Managed table
- External table
Que 13. Explain Hive Thrift server?
Ans. There is an optional component in Hive that we call as HiveServer or HiveThrift. Basically, that allows access to Hive over a single port. However, for scalable cross-language services development Thrift is a software framework. Also, it allows clients using languages including Java, C++, Ruby, and many others, to programmatically access Hive remotely.
Que 14. How to Write a UDF function in Hive?
Ans. Basically, following are the steps:
- Create a Java class for the User Defined Function which extends ora.apache.hadoop.hive.sq.exec.UDF and implements more than one evaluate() methods. Put in your desired logic and you are almost there.
- Package your Java class into a JAR file
- Go to Hive CLI, add your JAR, and verify your JARs is in the Hive CLI classpath
- CREATE TEMPORARY FUNCTION in Hive which points to your Java class
- Then Use it in Hive SQL.
Que 15. Differentiate between PigLatin and Hive
Ans. Let’s see feature wise difference between them:
1. Language Used
In Hive, there is a declarative language called HiveQL which is like SQL.
In Pig, there is a procedural language called Pig Latin.
2. Mainly Used for
Mainly, data analysts use Apache Hive.
Mainly, researchers and programmers use Apache Pig.
Basically, Hive allows structured data.
However, Apache Pig allows both structured and semi-structured data.
4. Operates on
Basically, Hive component operates on a server side of the cluster.
However, Pig server operates on the client side of the cluster.
Learn the complete difference, follow the link:Hive Vs Pig
Que 16. What is the difference between Internal Table and External Table in Hive?
Ans. Hive Managed Tables-
It is also known an internal table. When we create a table in Hive, it by default manages the data. This means that Hive moves the data into its warehouse directory.
- We want Hive to completely manage the lifecycle of the data and table.
- Data is temporary
Hive External Tables-
We can also create an external table. It tells Hive to refer to the data that is at an existing location outside the warehouse directory.
- Data is used outside of Hive. For example, the data files are read and processed by an existing program that does not lock the files.
- We are not creating a table based on the existing table.
Learn the complete difference, follow the link: Hive – Internal vs External tables
Que 17. Difference between order by and sort by in Hive?
Ans. So, the difference is:
- Sort by
hive> SELECT E.EMP_ID FROM Employee E SORT BY E.empid;
- for final output, it may use multiple reducers.
- within a reducer only guarantees to order of rows.
iii. it gives partially ordered result.
- Order by
hive> SELECT E.EMP_ID FROM Employee E order BY E.empid;
- Basically, to guarantee the total order in output Uses single reducer.
- Also, to minimize sort time LIMIT can be used.
Que 18. What are different modes of metastore deployment in Hive?
Ans. There are three modes for metastore deployment which Hive offers.
1. Embedded metastore
Here, by using embedded Derby Database both metastore service and hive service runs in the same JVM.
2. Local Metastore
However, here, Hive metastore service runs in the same process as the main Hive Server process, but the metastore database runs in a separate process.
3. Remote Metastore
Here, metastore runs on its own separate JVM, not in the Hive service JVM.
Que 19. Difference between HBase vs Hive
Ans. Following points are feature wise comparison of HBase vs Hive.
Basically, Apache Hive is not a database.
HBase does support NoSQL database.
2. Type of processing
Hive does support Batch processing. That is OLAP.
HBase does support real-time data streaming. That is OLTP.
3. Data Schema
Basically, it supports to have schema model
However, it is schema-free
Learn the complete difference, follow the link: Hive Vs HBase
Que 20. What is the relation between MapReduce and Hive?
Ans. Hive offers no capabilities to MapReduce. The Programs are executed as MapReduce Job via the interpreter. Then Interpreter runs on the Client machine. Afterward, that runs HiveQL Queries to MR jobs. Furthermore, Framework submits jobs to the cluster.
Hive Interview Questions and Answers for Freshers – Q. 12,13,14,15,16,19,20
Hive Interview Questions and Answers for Experience – Q. 11,17,18
Que 21. What is the importance of driver in Hive?
Ans. Driver manages the life cycle of Hive QL Queries. It receives the queries from UI and fetches on JDBC interfaces to process the query. Also, it creates a separate section to handle the query.
Que 22. How can you configure remote metastore mode in Hive?
Ans. To use this remote metastore, you should configure Hive service by setting hive.metastore.uris to the metastore server URI(s). Metastore server URIs are of the form thrift://host:port, where the port corresponds to the one set by METASTORE_PORT when starting the metastore server.
Que 23. Can we LOAD data into a view?
Que 24. What types of costs are associated with creating the index on hive tables?
Ans. Basically, there is a processing cost in arranging the values of the column on which index is created since Indexes occupies.
Que 25. Give the command to see the indexes on a table.
Ans. SHOW INDEX ON table_name
Basically, in the table table_name, this will list all the indexes created on any of the columns.
Que 26. How do you specify the table creator name when creating a table in Hive?
Ans. The TBLPROPERTIES clause is used to add the creator name while creating a table.
The TBLPROPERTIES is added like −
Que 27. Difference between Hive and Impala?
Ans. Following are the feature wise comparison between Impala vs Hive:
1. Query Process
Basically, in Hive every query has the common problem of a “cold start”.
Impala avoids any possible startup overheads, being a native query language. However, that are very frequently and commonly observed in MapReduce based jobs. Moreover, to process a query always Impala daemon processes are started at the boot time itself, making it ready.`
2. Intermediate Results
Basically, Hive materializes all intermediate results. Hence, it enables enabling better scalability and fault tolerance. However, that has an adverse effect on slowing down the data processing.
However, it’s streaming intermediate results between executors. Although, that trades off scalability as such.
3. During the Runtime
At Compile time, Hive generates query expressions.
During the Runtime, Impala generates code for “big loops”.
Learn the complete difference, follow the link: Hive Vs Impala
Que 28. What are types of Hive Built-In Functions?
Ans. So, its types are:
- Collection Functions
- Hive Date Functions
- Mathematical Functions
- Conditional Functions
- Hive String Functions
Follow the link to learn it in detail: Hive – Built-In Functions
Que 29. Types of Hive DDL Commands.
Ans. However, there are several types of Hive DDL commands, we commonly use. such as:
- Create Database Statement
- Hive Show Database
- Drop database
- Creating Hive Tables
- Browse the table
- Altering and Dropping Tables
- Hive Select Data from Table
- Hive Load Data
Follow the link to learn it in detail: Hive – DDL Commands
Que 30. What are Hive Operators and its Types?
Ans. Hive operators are used for mathematical operations on operands. Also, it returns specific value as per the logic applied.
- Types of Hive Built-in Operators
- Relational Operators
- Arithmetic Operators
- Logical Operators
- String Operators
- Operators on Complex Types
Follow the link to learn it in detail: Hive – Operators
Hive Interview Questions and Answers for Freshers – Q. 21,23,24,25,26,27,28,29,30
Hive Interview Questions and Answers for Experience – Q. 22
As a result, we have seen all possible Apache Hive Interview Questions and Answers for Experienced & Freshers. Furthermore, if any doubt regarding Hive Interview Questions and Answers for Experience, feel free to ask through the comment section.