30 Mostly Asked Hive Interview Questions and Answers

Even after “Top 30 Hive Interview Questions and Answers: Part-1“, there are many more Hive Interview Questions and Answer can be asked in Apache Hive interviews.

So, in this article, we are providing possible Hive Scenario based Interview Questions as Part-2. However, let’s first discuss Hive, further we will discuss more Hive Interview Questions.

Top 30 Best Hive Interview Questions and Answers

Que 1. What is Apache Hive?

Ans. Basically, the tool to process structured data in Hadoop we use Hive. It is a data warehouse infrastructure. Moreover, to summarize Big Data, it resides on top of Hadoop. Also, makes querying and analyzing easy.

However, the Apache Software Foundation took it up, but initially, Hive was developed by Facebook. Further Apache Software Foundation developed it as an open source under the name Apache Hive. Although, many different companies use it. For example, Amazon uses it in Amazon Elastic MapReduce.

Que 2. What is SerDe in Apache Hive?

Ans. Basically, for Serializer/Deserializer, SerDe is an acronym. However, for the purpose of IO, Hive uses the Hive SerDe interface.

Hence, it handles both serialization and deserialization in Hive. Also, interprets the results of serialization as individual fields for processing.

Que 3. Which classes are used by the Hive to Read and Write HDFS Files

Ans. Following classes are used by Hive to read and write HDFS files

  • TextInputFormat/HiveIgnoreKeyTextOutputFormat:  Basically, it read/write data in plain text file format.
  • SequenceFileInputFormat/SequenceFileOutputFormat: However, it read/write data in Hadoop SequenceFile format.

Que 4. Give examples of the SerDe classes which hive uses to Serialize and Deserialize data?

Ans. Basically, to Serialize and Deserialize data Hive uses its classes;

  1. MetadataTypedColumnsetSerDe
    So, to read/write delimited records we use this Hive SerDe. Such as CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
  2. ThriftSerDe
    To read/write Thrift serialized objects, we use this Hive SerDe. However, make sure, for the Thrift object the class file must be loaded first.
  3. DynamicSerDe
    To read/write Thrift serialized objects we use this Hive SerDe.

 

Que 5. How do you write your own SerDe?

Ans. However, following are the ways:

  • Despite SerDe users want to write a Deserializer in most cases. It is because users just want to read their own data format instead of writing to it
  • By using the configuration parameter ‘regex’, the RegexDeserializer will deserialize the data, and possibly a list of column names (see serde2.MetadataTypedColumnsetSerDe).

Que 6 Can a table be renamed in Hive?

Ans. Alter Table table_name RENAME TO new_name

Que 7. Is there a date data type in Hive?

Ans. Yes, in java.sql.timestamp format, the TIMESTAMP data types stores date.

Que 8. What are collection data types in Hive?

Ans. Basically,  in Hive, there are three collection data types. Such as;
ARRAY
MAP
STRUCT

Que 9. Can we run unix shell commands from the hive? Give example.

Ans. Yes, when, just before the command we use the ! mark, we run unix shell commands from the hive.
For example-
! pwd at hive prompt will list the current directory.

Que 10. What is a Hive variable? What for we use it?

Ans. A variable created in the Hive environment that can be referenced by Hive scripts is what we call a Hive Variable. Basically, when the query starts executing it is used to pass some values to the hive queries.

Hive Interview Questions and Answers for freshers- Q. 1,2,4,6,7,8

Hive Interview Questions and Answers for experience- Q. 3,5,9,10

Que 11. Can hive queries be executed from script files? How?

Ans. It is possible by using the source command.
For example −
Hive> source /path/to/file/file_with_query.hql

Que 12. What is the importance of .hiverc file?

Ans.  Basically, when the hive CLI starts, it is a file containing the list of commands needs to run. Like,  setting the strict mode to be true etc.

Que 13. What are the default record and field delimiter used for hive text files?

Ans. The default record delimiter is − \n
And the filed delimiters are − \001,\002,\003

Que 14. What do you mean by schema on reading?

Ans. However, while reading the data and not enforced when writing data, the schema is validated with the data.

Que 15. How do you list all databases whose name starts with p?

Ans. SHOW DATABASES LIKE ‘p.*’

Que 16. What does the “USE” command in the hive do?

Ans. Basically, fix the database on which all the subsequent hive queries will run we use the “USE” command in Hive.

Que 17. How can you delete the DBPROPERTY in Hive?

Ans. We cannot delete the DBPROPERTY in Hive.

Que 18.What is the significance of the line set hive.mapred.mode = strict;

Ans. Basically, in strict mode, it sets the MapReduce jobs. So, by which the queries on partitioned tables cannot run without a WHERE clause. Hence, it prevents very large job running for a long time.

Que 19. What is the maximum size of string data type supported by Hive?

Ans. Maximum size is 2 GB.

Que 20. Mention what are views in Hive?

Ans. However, Views are Similar to tables, in Hive. Basically, they are generated based on the requirements.

–  Also, we can save any result set data as a view in Hive.
– Although, its usage is similar to as views used in SQL.
– while we can perform all type of DML operations on a view.

Hive Interview Questions and Answers for freshers- Q. 12,13,14,15,16,17,18,19,20

Hive Interview Questions and Answers for experience- Q. 11

Que 21. How do you check if a particular partition exists?

Ans. Basically, with the following query, we can check whether a particular partition exists or not
SHOW PARTITIONS table_name PARTITION(partitioned_column=’partition_value’)

Que 22. Which java class handles the Input record encoding into files which store the tables in Hive?

Ans. org.apache.hadoop.mapred.TextInputFormat

Que 23. Which java class handles the output record encoding into files which result from Hive queries?

Ans. org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Que 24. What is the significance of ‘IF EXISTS” clause while dropping a table?

Ans. Since, the table being dropped does not exist in the first place, Hive throws an error, when we issue the command DROP TABLE IF EXISTS table_name.

Que 25. When you point a partition of a hive table to a new directory, what happens to the data?

Ans. Basically, the data stays in the old location. Hence, it has to be moved manually.

Que 26. What is a generic UDF in the hive?

Ans. By using a java program to the server we create a UDF. Although, some specific need not covered under the existing functions in Hive. Also, it can detect the type of input argument programmatically and provide the appropriate response.

Que 27. Can a partition be archived? What are the advantages and Disadvantages?
Ans. Yes. We can archive a partition.

  • Advantage

Basically, it decreases the number of files stored in name node and the archived file can be queried using hive.  

  • Disadvantage

Although, it will cause less efficient query and does not offer any space savings.

Que 28. Difference between SQL and HiveQL?

Ans. So, let’s discuss few differences between them:

  • DataBase Model
  1. SQL

Basically, it is based on a relational database model.

  1.  HQL

However, It is a combination of object-oriented programming with relational database concepts.

  • Main Purpose
  1. SQL

Basically, It manipulates data stored in tables and modifies its rows and columns.

  1. HQL

Whereas, it is concerned about objects and its properties.

  • Specific Concern
  1. SQL

Basically, It is concerned about the relationship that exists between two tables
   2.HQL
Whereas, it considers the relation between two objects.

Que 29. Hive vs Spark SQL

Ans. So, let’s compare Hive vs Spark SQL in detail.

  • Initial release
  1. Apache Hive

Basically, the hive was first released in the year 2012.

  1. Spark SQL

Whereas, Spark SQL was first released in the year 2014.

  • Current release
  1. Apache Hive

Currently released on 18 November 2017: version 2.3.2

  1. Spark SQL

 Currently released on 09 October 2017: version 2.1.2

  • Developer
  1. Apache Hive

Although, Facebook developed it originally. Further donated to the Apache Software Foundation, that has maintained it since.

  1. Spark SQL

However, Apache Software Foundation developed it originally.

Que 30. Pig vs Hive vs Hadoop MapReduce

Ans. So, let’s compare Pig vs Hive vs Hadoop MapReduce

  • Language
  1. Hive

Basically, it has SQL like Query language.

  1. MapReduce

Also, has compiled language.

  1. Pig

However, it has the scripting language.

  • Abstraction
  1. Hive

Basically, it has a Low level of Abstraction.

  1. MapReduce

Also, has the High level of Abstraction.

  1. Pig

Similarly, it also has the High level of Abstraction.

  • Line of codes
  1. Hive

Comparatively less no. of the line of codes from both MapReduce and Pig.

  1. MapReduce

However, it has More line of codes.

  1. Pig

Comparatively less no. of the line of codes from MapReduce.

Hive Interview Questions and Answers for freshers- Q. 27,28,29,30

Hive Interview Questions and Answers for experience- Q. 21,22,23,24,25,26

Summary

As a result, we have seen more Hive Interview Questions and Answers in Part-2 which can be asked in Apache Hive Interviews. We are still finding possible Hive Interview Questions and Answers to provide you complete Guide regarding.

However, if you found any query regarding the above Questions please ask through the comment section. Also, keep visiting our site for more updates on Hive Interview questions and answers as well as articles on Big Data Technologies.

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *