HiveQL Select – Group By Query | Group By Clause

1. Group By Query – Objective

In Apache Hive Tutorial, for grouping particular column values mentioned with the group by Query. Basically, we use Hive Group by Query with Multiple columns on Hive tables. However, we need to know the syntax of HiveQL group by query to implement it. So, in this article, we will learn what is Hive Query – Group by Query, syntax, and an example of HiveQL Select Group By Clause to understand with JDBC Program.

HiveQL Group By Query | Group By Clause

HiveQL Group By Query | Group By Clause

2. What is Hive Query?

However, for the ETL purpose on top of Hadoop file system Hive offers SQL type querying language. Also, to work with tables, databases queries Hive Query language (HiveQL) offers SQL type environment in Hive.
In addition, to perform different type data manipulations and querying it is possible to have a different type of clauses associated with Hive. Especially, for better connectivity with different nodes outside the environment. Also, HIVE offers JDBC connectivity.
Also, there are several features of Hive queries offers. Such as:

  • For example data modeling. Basically, for the creation of databases, tables, etc.
  • Moreover, ETL functionalities. For example, Extraction, Transformation, and Loading data into tables.
  • Also, it offers joins to merge different data tables.
  • While it comes to ease of code, it offers user-specific custom scripts.
  • Also, it offers a faster-querying tool on top of Hadoop.

Read More about What is HiveQL SELECT Statement

Hadoop Quiz
If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. HiveQL Select

Basically, for grouping particular column values mentioned with the group by query, Group by clause use columns on Hive tables. However, column name does not matter, since for whatever the name we are defining a Group By query will selects and display results by grouping the particular column values.

i.  Group by Query Syntax

However,  see below the syntax of GROUP BY Clause:
SELECT [ALL | DISTINCT] select_expr, select_expr, …
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];
Let’s read about Apache Hive Built-In Functions in detail

ii. Group by Query Example

Also, to understand well, see an example below. Although, let’s suppose an employee table. Basically, it includes Id, Name, Salary, Designation, and Dept fields. However, to retrieve the number of employees in each department Generate a query.
Table 1- Group By Clause Example

IDNameSalaryDesignationDept
1201Ross45000Tech managerTP
1202Rachel45000ProofreaderPR
1203Monika40000Technical writerTP
1204Mike45000ProofreaderPR
1205Joey30000OP AdminAdmin

Moreover, BY using the above scenario, the following query retrieves the employee details.
hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;
Let’s discuss Hive Partitions and Types of Hive Partitioning with Examples
Also, we get to see the following response to the successful execution of the query:
Table 2 – Group By Query 

Dept Count(*)
Admin1
PR2
TP3

iii. JDBC Program

Moreover, to apply the Group By clause for the given example, here is the JDBC program is given below.
For example,

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveQLGroupBy
      {
         private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";  
         public static void main(String[] args) throws SQLException
              {
// Register driver and create driver instance
                   Class.forName(driverName);     
// get connection
                   Connection con = DriverManager.
                   getConnection("jdbc:hive://localhost:10000/userdb", "", "");     
// create statement
                   Statement stmt = con.createStatement();     
// execute statement
                   Resultset res = stmt.executeQuery(“SELECT Dept,count(*) ” + “FROM employee GROUP BY DEPT; ”);
                   System.out.println(" Dept \t count(*)");     
                   while (res.next())
                     {
                             System.out.println(res.getString(1) + " " + res.getInt(2));
                     }
                  con.close();
            }
     }

Also, use the following commands to compile and execute this program. Moreover, save the program in a file named HiveQLGroupBy.java.

$ javac HiveQLGroupBy.java
$ java HiveQLGroupBy
Let’s learn Features of Java Programming Language in detail

iv. Group By Clause – Output

However, here is the possible output of Group By Query.
Table 3 – Group By Query 

Dept Count(*)
Admin1
PR2
TP3

So, this was all about Apache HiveQL Select – Group By Query Tutorial. Hope you like our explanation of Hive Group by Clause.

4. Conclusion

As a result, we have seen the whole concept of HiveQL Select -Group By query in Apache Hive, with a group by query example & syntax, we also discuss JDBC program with its output to understand HiveQL – Group By clause well. In our next tutorial, we will study hive Oder By Query in detail. Still, if you have any query, feel free to ask in the comment section.
Related Topic- Different Ways to Configure Hive Metastore
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.