Hive Join | HiveQL Select Joins Query | Types of Join in Hive

Keeping you updated with latest technology trends, Join DataFlair on Telegram

1. Apache Hive Join – Objective

In Apache Hive, for combining specific fields from two tables by using values common to each one we use Hive Join – HiveQL Select Joins Query. However, we need to know the syntax of Hive Join for implementation purpose. So, in this article, “Hive Join – HiveQL Select Joins Query and its types” we will cover syntax of joins in hive. Also, we will learn an example of Hive Join to understand well. Moreover, there are several types of Hive join – HiveQL Select Joins: Hive inner join, hive left outer join, hive right outer join, and hive full outer join. We will also learn Hive Join tables in depth.

Hive Join - HiveQL Select Joins Query

Hive Join – HiveQL Select Joins Query

2. Apache Hive Join – HiveQL Select Joins Query

Basically, for combining specific fields from two tables by using values common to each one we use Hive JOIN clause. In other words, to combine records from two or more tables in the database we use JOIN clause. However, it is more or less similar to SQL JOIN. Also, we use it to combine rows from multiple tables.
Moreover, there are some points we need  to observe about Hive Join:

  • In Joins, only Equality joins are allowed.
  • However, in the same query more than two tables can be joined.
  • Basically, to offer more control over ON Clause for which there is no match LEFT, RIGHT, FULL OUTER joins exist in order.
  • Also, note that Hive Joins are not Commutative
  • Whether they are LEFT or RIGHT joins in Hive, even then Joins are left-associative.

Let’s learn Hive UDF – User Defined Function with Example

a. Apache Hive Join Syntax

Following is a syntax of Hive Join – HiveQL Select Clause.
join_table:
  table_reference JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
  join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition]
Read more Hive DDL Commands : Types of DDL Hive Commands

b. Example of Join in Hive

Example of Hive Join – HiveQL Select Clause.
However, to understand well let’s suppose the following table named CUSTOMERS.
Table.1 Hive Join Example

ID Name Age Address Salary
1 Ross 32 Ahmedabad 2000
2 Rachel 25 Delhi 1500
3 Chandler 23 Kota 2000
4 Monika 25 Mumbai 6500
5 Mike 27 Bhopal 8500
6 Phoebe 22 MP 4500
7 Joey 24 Indore 10000

Also, suppose another table ORDERS as follows:
Table.2 – Hive Join Example

OID Date Customer_ID Amount
102 2016-10-08 00:00:00 3 3000
100 2016-10-08 00:00:00 3 1500
101 2016-11-20 00:00:00 2 1560
103 2015-05-20 00:00:00 4 2060
If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Richa Tandon Success Story - DataFlair
Richa Tandon
Support → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. Type of Joins in Hive

Basically, there are 4 types of Hive Join. Such as:

HiveQL Select Joins

Types of Hive Join | HiveQL Select Clause

  1. Inner join in Hive
  2. Left Outer Join in Hive 
  3. Right Outer Join in Hive
  4. Full Outer Join in Hive 

So, let’s discuss each Hive join in detail.
Let’s look at Different Ways to Configure Hive Metastore

a. Inner Join

Basically, to combine and retrieve the records from multiple tables we use Hive Join clause. Moreover, in SQL JOIN is as same as OUTER JOIN. Moreover, by using the primary keys and foreign keys of the tables JOIN condition is to be raised.
Furthermore, the below query executes JOIN the CUSTOMER and ORDER tables. Then further retrieves the records:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
Let’s figure about Difference between Hive Partitioning vs Bucketing
Moreover, we get to see the following response, on the successful execution of the query:
Table.3 – Inner Join in Hive

ID Name Age Amount
3 Chandler 23 1300
3 Chandler 23 1500
2 Rachel 25 1560
4 Monika 25 2060

b. Left Outer Join

On defining HiveQL Left Outer Join, even if there are no matches in the right table it returns all the rows from the left table. To be more specific, even if the ON clause matches 0 (zero) records in the right table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the right table.
In addition, it returns all the values from the left table. Also, the matched values from the right table, or NULL in case of no matching JOIN predicate.
However, the below query shows LEFT OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Read about Apache Hive View and Hive Index
Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:
Table.4 – Left Outer Join in Hive

ID Name Amount Date
1 Ross NULL NULL
2 Rachel 1560 2016-11-20 00:00:00
3 Chandler 3000 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
4 Monika 2060 2015-05-20 00:00:00
5 Mike NULL NULL
6 Phoebe NULL NULL
7 Joey NULL NULL

c. Right Outer Join

Basically, even if there are no matches in the left table, HiveQL Right Outer Join returns all the rows from the right table. To be more specific, even if the ON clause matches 0 (zero) records in the left table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the left table
In addition, it returns all the values from the right table. Also, the matched values from the left table or NULL in case of no matching join predicate.

However, the below query shows RIGHT OUTER JOIN between the CUSTOMER as well as ORDER tables.Let’s know Apache Hive Features & Limitations of Hive in detail

notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:
Table.5 – Right Outer Join in Hive

ID Name Amount Date
3 Chandler 1300 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
2 Rachel 1560 2016-11-20 00:00:00
4 Monika 2060 2015-05-20 00:00:00

d. Full Outer Join

The major purpose of this HiveQL Full outer Join is it combines the records of both the left and the right outer tables which fulfills the Hive JOIN condition. Moreover, this joined table contains either all the records from both the tables or fills in NULL values for missing matches on either side.
However, the below query shows  FULL OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

 

Do you know Best Apache Hive Books to learn Hive
Moreover, we get to see the following response, on the successful execution of the query:
Table.6 – Full Outer Join in Hive

ID Name Amount Date
1 Ross NULL NULL
2 Rachel 1560 2016-11-20 00:00:00
3 Chandler 3000 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
4 Monika 2060 2015-05-20 00:00:00
5 Mike NULL NULL
6 Phoebe NULL NULL
7 Joey NULL NULL
3 Chandler 3000 2016-10-08 00:00:00
3 Chandler 1500 2016-10-08 00:00:00
2 Rachel 1560 2016-11-20 00:00:00
4 Monika 2060 2015-05-20 00:00:00

This was all about HiveQL Select – Apache Hive Join Tutorial. Hope you like our explanation of Types of Joins in Hive.

4. Conclusion

As a result, we have seen what is Apache Hive Join and possible types of Join in Hive- HiveQL Select. Also, we have seen several examples to understand Joins in Hive well. Moreover, we hope this article “Hive Join” will help a lot. Still, if any doubt occurs feel free to ask in the comment section.
Related Topic-30 Mostly Asked Hive Interview Questions and Answers
For reference

4 Responses

  1. sonu kuma says:

    correct your inner join example output you given Address column name insteed of Amount name

    • DataFlair Team says:

      Thank-you for mentioning the mistake. we appreciate you notice that. Also, we have updated it correctly. Hope it helps!

  2. Abu Zohad says:

    Right outer join and Full outer join example outputs are not correct.

  3. Subh says:

    The result of Full Outer Join show here is not correct

    There should not be any duplicates as shown below:

    SELECT
    A.ID, A.NAME, A.AGE, B.ODID, B.ODATE, B.AMOUNT
    FROM
    SUBH_CUST1 A
    FULL OUTER JOIN
    SUBH_ORDER1 B
    ON A.ID = B.CUST_ID;

    +——-+———–+——–+———+————————+———–+
    | a.id | a.name | a.age | b.odid | b.odate | b.amount |
    +——-+———–+——–+———+————————+———–+
    | 1 | Ross | 32 | NULL | NULL | NULL |
    | 2 | Rachel | 25 | 101 | 2016-11-20 00:00:00.0 | 1560 |
    | 3 | Chandler | 23 | 100 | 2016-10-08 00:00:00.0 | 1500 |
    | 3 | Chandler | 23 | 102 | 2016-10-08 00:00:00.0 | 3000 |
    | 4 | Monika | 25 | 103 | 2015-05-20 00:00:00.0 | 2060 |
    | 5 | Mike | 27 | NULL | NULL | NULL |
    | 6 | Phoebe | 22 | NULL | NULL | NULL |
    | 7 | Joey | 24 | NULL | NULL | NULL |
    +——-+———–+——–+———+————————+———–+

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.