Apache Hive View and Hive Index

Boost your career with Data Engineering Courses!!

In our last article, we see Hive Built-in Functions. In this Hive index Tutorial, we will learn the whole concept of Hive Views and Indexing in Hive.

Also, we will cover how to create Hive Index and hive Views, manage views and Indexing of hive, hive index types, hive index performance, and hive view performance. In addition, we will learn several examples to understand both.

We can save any result set data as a view. Whereas Apache Hive Index is a pointer to a particular column of a table. However, there is much more to learn about Apache Hive View and Indexing in Apache Hive.

Apache Hive View and Index

In this Part of a tutorial, we are going to study definition and concept of Apache Hive View and Hive Index.

a. What is Hive view?

Basically, Apache Hive View is similar to Hive tables, that are generated on the basis of requirements.

Apache Hive View on External Table

As a Hive view, we can save any result set data.
Well, we can say its usage is as same as the use of views in SQL.
Although, we can perform all type of DML operations on Hive views.

In other words, Apache Hive View is a searchable object in a database which we can define by the query. However, we can not store data in the view. Still, some refer to as a view as “virtual tables”.

Hence, we can query a view like we can a table. Moreover, by using joins it is possible to combine data from or more table. Also, it contains a subset of information.

i. Apache Hive View Syntax

Create VIEW < VIEWNAME> AS SELECT

ii. Creating a Hive View

However, at the time of executing a SELECT statement, we can create a view. So, to create Hive view Syntax is:
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], …) ]
[COMMENT table_comment]
AS SELECT …

iii. Apache Hive View Example

Let’s suppose, an employee table. It includes fields Id, Name, Salary, Designation, and Dept. Now here we are generating a query to retrieve the employee details who earn a salary of more than Rs 35000. So, we store the result in a view named emp_30000.
Table 1- Apache Hive View

ID	Name	Salary	Designation	Dept
1201	Michel	45000	Technical manager	TP
1202	Chandler	45000	Proofreader	PR
1203	Ross	40000	Technical writer	TP
1204	Joey	40000	Hr Admin	HR
1205	Monika	35000	Op Admin	Admin

Hence, using the above scenario here is the following query retrieves the employee details:
hive> CREATE VIEW emp_35000 AS
SELECT * FROM employee
WHERE salary>35000

iv. Dropping a Hive View

However, to drop a Hive view, use the following syntax:
DROP VIEW view_name
The following query drops a view named as emp_35000:
hive> DROP VIEW emp_35000;

b. What is Apache Hive Index?

On defining indexing in Hive we can say these are pointers to particular column name of a table.

Introduction – Apache Hive Index

However, the user has to manually define the Hive index
Basically, we are creating the pointer to particular column name of the table, wherever we are creating Hive index.
By using the Hive index value created on the column name, any Changes made to the column present in tables are stored.

i. Apache Hive index Syntax

Create INDEX < INDEX_NAME> ON TABLE < TABLE_NAME(column names)>

ii. Create an Indexing in Hive

However, creating a Apache Hive index means creating a pointer on a particular column of a table. So, to create an indexing in hive.
Apache Hive Index Syntax is:

CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
  [ ROW FORMAT ...] STORED AS ...
  | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]

iii. Apache Hive Index Example

Let’s suppose the same employee table which we had used earlier with the fields Id, Name, Salary, Designation, and Dept. So, here create an index named index_salary on the salary column of the employee table.

Hence, we use the following query to create a Hive index:

hive> CREATE INDEX inedx_salary ON TABLE employee(salary)
AS ‘org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler’;

However, it is a pointer to the salary column. Basically, the changes are stored using an index value, if the column is modified.

iv. Dropping an Index

Also, drop indexing in Hive we use the following syntax of Apache Hive Index:
DROP INDEX <index_name> ON <table_name>
Here, is the following query drops a Hive index named index_salary:
hive> DROP INDEX index_salary ON employee;

v. When to use Hive Indexing?

Under the following circumstances, we can use Indexing in Hive:

While the dataset is very large.
Whenever the query execution is more amount of time than you expected.
While we need a speedy query execution.
While we build a data model.

Hive Index is maintained in a separate table. Hence, it won’t affect the data inside the table, which contains the data. There is one more advantage of it. That is for indexing in Hive is that index can also be partitioned depending on the size of the data we have.

vi. Types of Hive Index

These two are Apache Hive Index types:

Compact Indexing in Hive
Bitmap Indexing in Hive

So, this was all about Apache Hive Index Tutorial. Hope you like our explanation of Indexing in Hive.

Conclusion

Hence, in this apache hive index, we studied what is Hive View and Hive Index.

We have learned the whole concept of Apache Hive view with Syntax, how to create apache hive views with example, dropping a Hive view, Hive Index with Syntax, How to create Hive Index with Example, Dropping an Indexing in the hive, When to use Indexing in the hive, and types of Hive Index.

Well also including these features correctly in your Hive implementation will simplify and provide your users with a better experience and Knowing well. Still, if any doubt occurs feel free to ask in the comment section.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google

Eswar says:
May 15, 2018 at 9:15 pm
Does hive support indexing for all file formats or only specific file formats like ORC?
- Data Flair says:
  August 18, 2018 at 12:21 pm
  Hi Eswar,
  Thanks for Visiting Data-Flair, we are happy you asked your query on this “Apache Hive View and Hive Index” Tutorial. Here, is the solutions –
  Hive supports indexing only for ORC because ORC has built-in Indexes that permits the format to skip blocks of data during reading. Moreover, they also support Bloom filters. Hence, together it replicates what Hive Indexes did and they perform it automatically in the data format even without the need to manage an external table.
  Hope we solve your query. Feel free to explore Hive Tutorial.
Ramakrishna says:
January 22, 2019 at 3:09 pm
Hi
Can you just help how can we add ‘hplsql’ in hive
Vishakha Sadawarte says:
June 13, 2019 at 9:18 am
Please improve the English grammar in this post.
- C Singh says:
  August 25, 2019 at 3:19 pm
  exactly my thought
Rishan says:
November 20, 2019 at 8:00 pm
What is the difference between Compact Indexing in Hive & Bitmap Indexing in Hive in Hive
Sai kumar says:
December 15, 2019 at 6:00 am
Compact indexing can be applied only on the column which is unique in table for ex: dept_no, whereas Bitmap can indexing can be applied either on unique or non unique columns,

Apache Hive View and Hive Index

Apache Hive View and Index

a. What is Hive view?

i. Apache Hive View Syntax

ii. Creating a Hive View

iii. Apache Hive View Example

iv. Dropping a Hive View

b. What is Apache Hive Index?

i. Apache Hive index Syntax

ii. Create an Indexing in Hive

iii. Apache Hive Index Example

iv. Dropping an Index

v. When to use Hive Indexing?

vi. Types of Hive Index

Conclusion

7 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials