Hive Join | HiveQL Select Joins Query | Types of Join in Hive

Boost your career with Free Big Data Courses!!

In Apache Hive, for combining specific fields from two tables by using values common to each one we use Hive Join – HiveQL Select Joins Query. However, we need to know the syntax of Hive Join for implementation purpose.

So, in this article, “Hive Join – HiveQL Select Joins Query and its types” we will cover syntax of joins in hive. Also, we will learn an example of Hive Join to understand well.

Moreover, there are several types of Hive join – HiveQL Select Joins: Hive inner join, hive left outer join, hive right outer join, and hive full outer join. We will also learn Hive Join tables in depth.

Apache Hive Join – HiveQL Select Joins Query

Basically, for combining specific fields from two tables by using values common to each one we use Hive JOIN clause.

In other words, to combine records from two or more tables in the database we use JOIN clause. However, it is more or less similar to SQL JOIN. Also, we use it to combine rows from multiple tables.

Moreover, there are some points we need  to observe about Hive Join:

  • In Joins, only Equality joins are allowed.
  • However, in the same query more than two tables can be joined.
  • Basically, to offer more control over ON Clause for which there is no match LEFT, RIGHT, FULL OUTER joins exist in order.
  • Also, note that Hive Joins are not Commutative
  • Whether they are LEFT or RIGHT joins in Hive, even then Joins are left-associative.

a. Apache Hive Join Syntax

Following is a syntax of Hive Join – HiveQL Select Clause.
join_table:
  table_reference JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
  join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition]

b. Example of Join in Hive

Example of Hive Join – HiveQL Select Clause.
However, to understand well let’s suppose the following table named CUSTOMERS.
Table.1 Hive Join Example

IDNameAgeAddressSalary
1Ross32Ahmedabad2000
2Rachel25Delhi1500
3Chandler23Kota2000
4Monika25Mumbai6500
5Mike27Bhopal8500
6Phoebe22MP4500
7Joey24Indore10000

Also, suppose another table ORDERS as follows:
Table.2 – Hive Join Example

OIDDateCustomer_IDAmount
102 2016-10-08 00:00:0033000
1002016-10-08 00:00:0031500
101 2016-11-20 00:00:0021560
1032015-05-20 00:00:0042060

Type of Joins in Hive

Basically, there are 4 types of Hive Join. Such as:

HiveQL Select Joins

Types of Hive Join | HiveQL Select Clause

  1. Inner join in Hive
  2. Left Outer Join in Hive 
  3. Right Outer Join in Hive
  4. Full Outer Join in Hive 

So, let’s discuss each Hive join in detail.

a. Inner Join

Basically, to combine and retrieve the records from multiple tables we use Hive Join clause. Moreover, in SQL JOIN is as same as OUTER JOIN. Moreover, by using the primary keys and foreign keys of the tables JOIN condition is to be raised.

Furthermore, the below query executes JOIN the CUSTOMER and ORDER tables. Then further retrieves the records:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the query:

Table.3 – Inner Join in Hive

IDNameAgeAmount
3Chandler231300
3Chandler231500
2Rachel251560
4Monika252060

b. Left Outer Join

On defining HiveQL Left Outer Join, even if there are no matches in the right table it returns all the rows from the left table.

To be more specific, even if the ON clause matches 0 (zero) records in the right table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the right table.

In addition, it returns all the values from the left table. Also, the matched values from the right table, or NULL in case of no matching JOIN predicate.
However, the below query shows LEFT OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:

Table.4 – Left Outer Join in Hive

IDNameAmount Date
1RossNULLNULL
2Rachel15602016-11-20 00:00:00
3Chandler30002016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
4Monika20602015-05-20 00:00:00
5MikeNULLNULL
6PhoebeNULLNULL
7JoeyNULLNULL

c. Right Outer Join

Basically, even if there are no matches in the left table, HiveQL Right Outer Join returns all the rows from the right table.

To be more specific, even if the ON clause matches 0 (zero) records in the left table, then also this Hive JOIN still returns a row in the result. Although, it returns with NULL in each column from the left table

In addition, it returns all the values from the right table. Also, the matched values from the left table or NULL in case of no matching join predicate.

However, the below query shows RIGHT OUTER JOIN between the CUSTOMER as well as ORDER tables.

notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the HiveQL Select query:
Table.5 – Right Outer Join in Hive

IDNameAmount Date
3Chandler13002016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
2Rachel15602016-11-20 00:00:00
4Monika20602015-05-20 00:00:00

d. Full Outer Join

The major purpose of this HiveQL Full outer Join is it combines the records of both the left and the right outer tables which fulfills the Hive JOIN condition. Moreover, this joined table contains either all the records from both the tables or fills in NULL values for missing matches on either side.

However, the below query shows  FULL OUTER JOIN between CUSTOMER as well as ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

Moreover, we get to see the following response, on the successful execution of the query:

Table.6 – Full Outer Join in Hive

IDNameAmount Date
1RossNULLNULL
2Rachel15602016-11-20 00:00:00
3Chandler30002016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
4Monika20602015-05-20 00:00:00
5MikeNULLNULL
6PhoebeNULLNULL
7JoeyNULLNULL
3Chandler3000 2016-10-08 00:00:00
3Chandler15002016-10-08 00:00:00
2Rachel15602016-11-20 00:00:00
4Monika20602015-05-20 00:00:00

This was all about HiveQL Select – Apache Hive Join Tutorial. Hope you like our explanation of Types of Joins in Hive.

Conclusion

As a result, we have seen what is Apache Hive Join and possible types of Join in Hive- HiveQL Select. Also, we have seen several examples to understand Joins in Hive well. Moreover, we hope this article “Hive Join” will help a lot. Still, if any doubt occurs feel free to ask in the comment section.

Did you like this article? If Yes, please give DataFlair 5 Stars on Google

courses

DataFlair Team

DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.

5 Responses

  1. sonu kuma says:

    correct your inner join example output you given Address column name insteed of Amount name

    • DataFlair Team says:

      Thank-you for mentioning the mistake. we appreciate you notice that. Also, we have updated it correctly. Hope it helps!

  2. Abu Zohad says:

    Right outer join and Full outer join example outputs are not correct.

  3. Subh says:

    The result of Full Outer Join show here is not correct

    There should not be any duplicates as shown below:

    SELECT
    A.ID, A.NAME, A.AGE, B.ODID, B.ODATE, B.AMOUNT
    FROM
    SUBH_CUST1 A
    FULL OUTER JOIN
    SUBH_ORDER1 B
    ON A.ID = B.CUST_ID;

    +——-+———–+——–+———+————————+———–+
    | a.id | a.name | a.age | b.odid | b.odate | b.amount |
    +——-+———–+——–+———+————————+———–+
    | 1 | Ross | 32 | NULL | NULL | NULL |
    | 2 | Rachel | 25 | 101 | 2016-11-20 00:00:00.0 | 1560 |
    | 3 | Chandler | 23 | 100 | 2016-10-08 00:00:00.0 | 1500 |
    | 3 | Chandler | 23 | 102 | 2016-10-08 00:00:00.0 | 3000 |
    | 4 | Monika | 25 | 103 | 2015-05-20 00:00:00.0 | 2060 |
    | 5 | Mike | 27 | NULL | NULL | NULL |
    | 6 | Phoebe | 22 | NULL | NULL | NULL |
    | 7 | Joey | 24 | NULL | NULL | NULL |
    +——-+———–+——–+———+————————+———–+

Leave a Reply

Your email address will not be published. Required fields are marked *