Hive Join | HiveQL Select Joins Query | Types of Join in Hive

1. Apache Hive Join – Objective

In Apache Hive, for combining specific fields from two tables by using values common to each one we use Hive Join – HiveQL Select Joins Query. However, we need to know the syntax of Hive Join for implementation purpose. So, in this article, “Hive Join – HiveQL Select Joins Query and its types” we will cover syntax of joins in hive. Also, we will learn an example of Hive Join to understand well. Moreover, there are several types of Hive join – HiveQL Select Joins: Hive inner join, hive left outer join, hive right outer join, and hive full outer join. We will also learn Hive Join tables in depth.

Hive Join - HiveQL Select Joins Query

Hive Join – HiveQL Select Joins Query

2. Apache Hive Join – HiveQL Select Joins Query

Basically, for combining specific fields from two tables by using values common to each one we use Hive JOIN clause. In other words, to combine records from two or more tables in the database we use JOIN clause. However, it is more or less similar to SQL JOIN. Also, we use it to combine rows from multiple tables.
Moreover, there are some points we need  to observe about Hive Join:

  • In Joins, only Equality joins are allowed.
  • However, in the same query more than two tables can be joined.
  • Basically, to offer more control over ON Clause for which there is no match LEFT, RIGHT, FULL OUTER joins exist in order.
  • Also, note that Hive Joins are not Commutative
  • Whether they are LEFT or RIGHT joins in Hive, even then Joins are left-associative.

Let’s learn Hive UDF – User Defined Function with Example

a. Apache Hive Join Syntax

Following is a syntax of Hive Join – HiveQL Select Clause.
join_table:
  table_reference JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
  join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition]
Read more Hive DDL Commands : Types of DDL Hive Commands

b. Example of Join in Hive

Example of Hive Join – HiveQL Select Clause.
However, to understand well let’s suppose the following table named CUSTOMERS.
Table.1 Hive Join Example

IDNameAgeAddressSalary
1Ross32Ahmedabad2000
2Rachel25Delhi1500
3Chandler23Kota2000
4Monika25Mumbai6500
5Mike27Bhopal8500
6Phoebe22MP4500
7Joey24Indore10000

Also, suppose another table ORDERS as follows:
Table.2 – Hive Join Example

OIDDateCustomer_IDAmount
102 2016-10-08 00:00:0033000
1002016-10-08 00:00:0031500
101 2016-11-20 00:00:0021560
1032015-05-20 00:00:0042060
Hadoop Quiz

Get the most demanding skills of IT Industry - Learn Hadoop

3. Type of Joins in Hive

Basically, there are 4 types of Hive Join. Such as:

HiveQL Select Joins

Types of Hive Join | HiveQL Select Clause

  1. Inner join in Hive
  2. Left Outer Join in Hive 
  3. Right Outer Join in Hive
  4. Full Outer Join in Hive 

So, let’s discuss each Hive join in detail.
Let’s look at Different Ways to Configure Hive Metastore

a. Inner Join

Basically, to combine and retrieve the records from multiple tables we use Hive Join clause. Moreover, in SQL JOIN is as same as OUTER JOIN. Moreover, by using the primary keys and foreign keys of the tables JOIN condition is to be raised.
Furthermore, the below query executes JOIN the CUSTOMER and ORDER tables. Then further retrieves the records:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
Let’s figure about Difference between Hive Partitioning vs Bucketing
Moreover, we get to see the following response, on the successful execution of the query:
Table.3 – Inner Join in Hive

IDNameAgeAmount
3Chandler231300
3Chandler231500
2Rachel251560
4Monika252060

b. Left Outer Join

On defining HiveQL Left Outer Join, even if there are no matches in the right table it returns all the rows from the left table. To be more specific, even if the ON clause matches 0 (zero) records in the right table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the right table.
In addition, it returns all the values from the left table. Also, the matched values from the right table, or NULL in case of no matching JOIN predicate.
However, the below query shows LEFT OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Read about Apache Hive View and Hive Index
Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:
Table.4 – Left Outer Join in Hive

IDNameAmount Date
1RossNULLNULL
2Rachel15602016-11-20 00:00:00
3Chandler30002016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
4Monika20602015-05-20 00:00:00
5MikeNULLNULL
6PhoebeNULLNULL
7JoeyNULLNULL

c. Right Outer Join

Basically, even if there are no matches in the left table, HiveQL Right Outer Join returns all the rows from the right table. To be more specific, even if the ON clause matches 0 (zero) records in the left table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the left table
In addition, it returns all the values from the right table. Also, the matched values from the left table or NULL in case of no matching join predicate.

However, the below query shows RIGHT OUTER JOIN between the CUSTOMER as well as ORDER tables.Let’s know Apache Hive Features & Limitations of Hive in detail

notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:
Table.5 – Right Outer Join in Hive

IDNameAmount Date
3Chandler13002016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
2Rachel15602016-11-20 00:00:00
4Monika20602015-05-20 00:00:00

d. Full Outer Join

The major purpose of this HiveQL Full outer Join is it combines the records of both the left and the right outer tables which fulfills the Hive JOIN condition. Moreover, this joined table contains either all the records from both the tables or fills in NULL values for missing matches on either side.
However, the below query shows  FULL OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

 

Do you know Best Apache Hive Books to learn Hive
Moreover, we get to see the following response, on the successful execution of the query:
Table.6 – Full Outer Join in Hive

IDNameAmount Date
1RossNULLNULL
2Rachel15602016-11-20 00:00:00
3Chandler30002016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
4Monika20602015-05-20 00:00:00
5MikeNULLNULL
6PhoebeNULLNULL
7JoeyNULLNULL
3Chandler3000 2016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
2Rachel15602016-11-20 00:00:00
4Monika20602015-05-20 00:00:00

This was all about HiveQL Select – Apache Hive Join Tutorial. Hope you like our explanation of Types of Joins in Hive.

4. Conclusion

As a result, we have seen what is Apache Hive Join and possible types of Join in Hive- HiveQL Select. Also, we have seen several examples to understand Joins in Hive well. Moreover, we hope this article “Hive Join” will help a lot. Still, if any doubt occurs feel free to ask in the comment section.
Related Topic-30 Mostly Asked Hive Interview Questions and Answers
For reference

3 Responses

  1. sonu kuma says:

    correct your inner join example output you given Address column name insteed of Amount name

    • DataFlair Team says:

      Thank-you for mentioning the mistake. we appreciate you notice that. Also, we have updated it correctly. Hope it helps!

  2. Abu Zohad says:

    Right outer join and Full outer join example outputs are not correct.

Leave a Reply

Your email address will not be published. Required fields are marked *