30 Mostly Asked Hive Interview Questions and Answers
Even after “Top 30 Hive Interview Questions and Answers: Part-1“, there are many more Hive Interview Questions and Answer can be asked in Apache Hive interviews. So, in this article, we are providing possible Hive Scenario based Interview Questions as Part-2. However, let’s first discuss Hive, further we will discuss more Hive Interview Questions.
2. Top 30 Best Hive Interview Questions and Answers
Que 1. What is Apache Hive?
Ans. Basically, the tool to process structured data in Hadoop we use Hive. It is a data warehouse infrastructure. Moreover, to summarize Big Data, it resides on top of Hadoop. Also, makes querying and analyzing easy.
However, the Apache Software Foundation took it up, but initially, Hive was developed by Facebook. Further Apache Software Foundation developed it as an open source under the name Apache Hive. Although, many different companies use it. For example, Amazon uses it in Amazon Elastic MapReduce.
Follow this link to know more about What is Hive in detail- Hive – Tutorial
Que 2. What is SerDe in Apache Hive?
Ans. Basically, for Serializer/Deserializer, SerDe is an acronym. However, for the purpose of IO, Hive uses the Hive SerDe interface. Hence, it handles both serialization and deserialization in Hive. Also, interprets the results of serialization as individual fields for processing.
Follow this link to know more about Hive SerDe in detail- Hive – SerDe
Que 3. Which classes are used by the Hive to Read and Write HDFS Files
Ans. Following classes are used by Hive to read and write HDFS files
- TextInputFormat/HiveIgnoreKeyTextOutputFormat: Basically, it read/write data in plain text file format.
- SequenceFileInputFormat/SequenceFileOutputFormat: However, it read/write data in Hadoop SequenceFile format.
Que 4. Give examples of the SerDe classes which hive uses to Serialize and Deserialize data?
Ans. Basically, to Serialize and Deserialize data Hive uses its classes;
So, to read/write delimited records we use this Hive SerDe. Such as CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
To read/write Thrift serialized objects, we use this Hive SerDe. However, make sure, for the Thrift object the class file must be loaded first.
To read/write Thrift serialized objects we use this Hive SerDe.
Follow this link to know more about Hive SerDe classes in detail- Hive – SerDe
Que 5. How do you write your own SerDe?
Ans. However, following are the ways:
- Despite SerDe users want to write a Deserializer in most cases. It is because users just want to read their own data format instead of writing to it
- By using the configuration parameter ‘regex’, the RegexDeserializer will deserialize the data, and possibly a list of column names (see serde2.MetadataTypedColumnsetSerDe).
Que 6 Can a table be renamed in Hive?
Ans. Alter Table table_name RENAME TO new_name
Que 7. Is there a date data type in Hive?
Ans. Yes, in java.sql.timestamp format, the TIMESTAMP data types stores date.
Que 8. What are collection data types in Hive?
Ans. Basically, in Hive, there are three collection data types. Such as;
Que 9. Can we run unix shell commands from the hive? Give example.
Ans. Yes, when, just before the command we use the ! mark, we run unix shell commands from the hive.
! pwd at hive prompt will list the current directory.
Que 10. What is a Hive variable? What for we use it?
Ans. A variable created in the Hive environment that can be referenced by Hive scripts is what we call a Hive Variable. Basically, when the query starts executing it is used to pass some values to the hive queries.
Hive Interview Questions and Answers for freshers- Q. 1,2,4,6,7,8
Hive Interview Questions and Answers for experience- Q. 3,5,9,10
Que 11. Can hive queries be executed from script files? How?
Ans. It is possible by using the source command.
For example −
Hive> source /path/to/file/file_with_query.hql
Que 12. What is the importance of .hiverc file?
Ans. Basically, when the hive CLI starts, it is a file containing the list of commands needs to run. Like, setting the strict mode to be true etc.
Que 13. What are the default record and field delimiter used for hive text files?
Ans. The default record delimiter is − \n
And the filed delimiters are − \001,\002,\003
Que 14. What do you mean by schema on reading?
Ans. However, while reading the data and not enforced when writing data, the schema is validated with the data.
Que 15. How do you list all databases whose name starts with p?
Ans. SHOW DATABASES LIKE ‘p.*’
Que 16. What does the “USE” command in the hive do?
Ans. Basically, fix the database on which all the subsequent hive queries will run we use the “USE” command in Hive.
Que 17. How can you delete the DBPROPERTY in Hive?
Ans. We cannot delete the DBPROPERTY in Hive.
Que 18.What is the significance of the line set hive.mapred.mode = strict;
Ans. Basically, in strict mode, it sets the MapReduce jobs. So, by which the queries on partitioned tables cannot run without a WHERE clause. Hence, it prevents very large job running for a long time.
Que 19. What is the maximum size of string data type supported by Hive?
Ans. Maximum size is 2 GB.
Que 20. Mention what are views in Hive?
Ans. However, Views are Similar to tables, in Hive. Basically, they are generated based on the requirements.
– Also, we can save any result set data as a view in Hive.
– Although, its usage is similar to as views used in SQL.
– while we can perform all type of DML operations on a view.
Follow this link to know more about Hive Views – Hive – View & Index
Hive Interview Questions and Answers for freshers- Q. 12,13,14,15,16,17,18,19,20
Hive Interview Questions and Answers for experience- Q. 11
Que 21. How do you check if a particular partition exists?
Ans. Basically, with the following query, we can check whether a particular partition exists or not
SHOW PARTITIONS table_name PARTITION(partitioned_column=’partition_value’)
Que 22. Which java class handles the Input record encoding into files which store the tables in Hive?
Que 23. Which java class handles the output record encoding into files which result from Hive queries?
Que 24. What is the significance of ‘IF EXISTS” clause while dropping a table?
Ans. Since, the table being dropped does not exist in the first place, Hive throws an error, when we issue the command DROP TABLE IF EXISTS table_name.
Que 25. When you point a partition of a hive table to a new directory, what happens to the data?
Ans. Basically, the data stays in the old location. Hence, it has to be moved manually.
Que 26. What is a generic UDF in the hive?
Ans. By using a java program to the server we create a UDF. Although, some specific need not covered under the existing functions in Hive. Also, it can detect the type of input argument programmatically and provide the appropriate response.
Follow this link to know more about UDFs in Hive- Hive – UDFs
Que 27. Can a partition be archived? What are the advantages and Disadvantages?
Ans. Yes. We can archive a partition.
Basically, it decreases the number of files stored in name node and the archived file can be queried using hive.
Although, it will cause less efficient query and does not offer any space savings.
Follow this link to know more about Hive Partition- Hive – Data Partitioning
Que 28. Difference between SQL and HiveQL?
Ans. So, let’s discuss few differences between them:
- DataBase Model
Basically, it is based on a relational database model.
However, It is a combination of object-oriented programming with relational database concepts.
- Main Purpose
Basically, It manipulates data stored in tables and modifies its rows and columns.
Whereas, it is concerned about objects and its properties.
- Specific Concern
Basically, It is concerned about the relationship that exists between two tables
Whereas, it considers the relation between two objects.
Que 29. Hive vs Spark SQL
Ans. So, let’s compare Hive vs Spark SQL in detail.
- Initial release
- Apache Hive
Basically, the hive was first released in the year 2012.
- Spark SQL
Whereas, Spark SQL was first released in the year 2014.
- Current release
- Apache Hive
Currently released on 18 November 2017: version 2.3.2
- Spark SQL
Currently released on 09 October 2017: version 2.1.2
- Apache Hive
Although, Facebook developed it originally. Further donated to the Apache Software Foundation, that has maintained it since.
- Spark SQL
However, Apache Software Foundation developed it originally.
Follow this link to know differences in detail: Apache Hive vs Spark SQL: Feature wise comparison
Que 30. Pig vs Hive vs Hadoop MapReduce
Ans. So, let’s compare Pig vs Hive vs Hadoop MapReduce
Basically, it has SQL like Query language.
Also, has compiled language.
However, it has the scripting language.
Basically, it has a Low level of Abstraction.
Also, has the High level of Abstraction.
Similarly, it also has the High level of Abstraction.
- Line of codes
Comparatively less no. of the line of codes from both MapReduce and Pig.
However, it has More line of codes.
Comparatively less no. of the line of codes from MapReduce.
Follow this link to know differences in detail: Pig vs Hive vs Hadoop MapReduce
Hive Interview Questions and Answers for freshers- Q. 27,28,29,30
Hive Interview Questions and Answers for experience- Q. 21,22,23,24,25,26
As a result, we have seen more Hive Interview Questions and Answers in Part-2 which can be asked in Apache Hive Interviews. We are still finding possible Hive Interview Questions and Answers to provide you complete Guide regarding. However, if you found any query regarding the above Questions please ask through the comment section. Also, keep visiting our site for more updates on Hive Interview questions and answers as well as articles on Big Data Technologies.