Site icon DataFlair

Hive Join | HiveQL Select Joins Query | Types of Join in Hive

Hive Join - HiveQL Select Joins Query

Hive Join - HiveQL Select Joins Query

In Apache Hive, for combining specific fields from two tables by using values common to each one we use Hive Join – HiveQL Select Joins Query. However, we need to know the syntax of Hive Join for implementation purpose.

So, in this article, “Hive Join – HiveQL Select Joins Query and its types” we will cover syntax of joins in hive. Also, we will learn an example of Hive Join to understand well.

Moreover, there are several types of Hive join – HiveQL Select Joins: Hive inner join, hive left outer join, hive right outer join, and hive full outer join. We will also learn Hive Join tables in depth.

Apache Hive Join – HiveQL Select Joins Query

Basically, for combining specific fields from two tables by using values common to each one we use Hive JOIN clause.

In other words, to combine records from two or more tables in the database we use JOIN clause. However, it is more or less similar to SQL JOIN. Also, we use it to combine rows from multiple tables.

Moreover, there are some points we need  to observe about Hive Join:

a. Apache Hive Join Syntax

Following is a syntax of Hive Join – HiveQL Select Clause.
join_table:
  table_reference JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
  join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition]

b. Example of Join in Hive

Example of Hive Join – HiveQL Select Clause.
However, to understand well let’s suppose the following table named CUSTOMERS.
Table.1 Hive Join Example

ID Name Age Address Salary
1 Ross 32 Ahmedabad 2000
2 Rachel 25 Delhi 1500
3 Chandler 23 Kota 2000
4 Monika 25 Mumbai 6500
5 Mike 27 Bhopal 8500
6 Phoebe 22 MP 4500
7 Joey 24 Indore 10000

Also, suppose another table ORDERS as follows:
Table.2 – Hive Join Example

OID Date Customer_ID Amount
102 2016-10-08 00:00:00 3 3000
100 2016-10-08 00:00:00 3 1500
101 2016-11-20 00:00:00 2 1560
103 2015-05-20 00:00:00 4 2060

Type of Joins in Hive

Basically, there are 4 types of Hive Join. Such as:

Types of Hive Join | HiveQL Select Clause

  1. Inner join in Hive
  2. Left Outer Join in Hive 
  3. Right Outer Join in Hive
  4. Full Outer Join in Hive 

So, let’s discuss each Hive join in detail.

a. Inner Join

Basically, to combine and retrieve the records from multiple tables we use Hive Join clause. Moreover, in SQL JOIN is as same as OUTER JOIN. Moreover, by using the primary keys and foreign keys of the tables JOIN condition is to be raised.

Furthermore, the below query executes JOIN the CUSTOMER and ORDER tables. Then further retrieves the records:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the query:

Table.3 – Inner Join in Hive

ID Name Age Amount
3 Chandler 23 1300
3 Chandler 23 1500
2 Rachel 25 1560
4 Monika 25 2060

b. Left Outer Join

On defining HiveQL Left Outer Join, even if there are no matches in the right table it returns all the rows from the left table.

To be more specific, even if the ON clause matches 0 (zero) records in the right table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the right table.

In addition, it returns all the values from the left table. Also, the matched values from the right table, or NULL in case of no matching JOIN predicate.
However, the below query shows LEFT OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:

Table.4 – Left Outer Join in Hive

ID Name Amount Date
1 Ross NULL NULL
2 Rachel 1560 2016-11-20 00:00:00
3 Chandler 3000 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
4 Monika 2060 2015-05-20 00:00:00
5 Mike NULL NULL
6 Phoebe NULL NULL
7 Joey NULL NULL

c. Right Outer Join

Basically, even if there are no matches in the left table, HiveQL Right Outer Join returns all the rows from the right table.

To be more specific, even if the ON clause matches 0 (zero) records in the left table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the left table

In addition, it returns all the values from the right table. Also, the matched values from the left table or NULL in case of no matching join predicate.

However, the below query shows RIGHT OUTER JOIN between the CUSTOMER as well as ORDER tables.

notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:
Table.5 – Right Outer Join in Hive

ID Name Amount Date
3 Chandler 1300 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
2 Rachel 1560 2016-11-20 00:00:00
4 Monika 2060 2015-05-20 00:00:00

d. Full Outer Join

The major purpose of this HiveQL Full outer Join is it combines the records of both the left and the right outer tables which fulfills the Hive JOIN condition. Moreover, this joined table contains either all the records from both the tables or fills in NULL values for missing matches on either side.

However, the below query shows  FULL OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the query:

Table.6 – Full Outer Join in Hive

ID Name Amount Date
1 Ross NULL NULL
2 Rachel 1560 2016-11-20 00:00:00
3 Chandler 3000 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
4 Monika 2060 2015-05-20 00:00:00
5 Mike NULL NULL
6 Phoebe NULL NULL
7 Joey NULL NULL
3 Chandler 3000 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
2 Rachel 1560 2016-11-20 00:00:00
4 Monika 2060 2015-05-20 00:00:00

This was all about HiveQL Select – Apache Hive Join Tutorial. Hope you like our explanation of Types of Joins in Hive.

Conclusion

As a result, we have seen what is Apache Hive Join and possible types of Join in Hive- HiveQL Select. Also, we have seen several examples to understand Joins in Hive well. Moreover, we hope this article “Hive Join” will help a lot. Still, if any doubt occurs feel free to ask in the comment section.

Exit mobile version