HiveQL Select – Group By Query | Group By Clause

Boost your career with Free Big Data Courses!!

In Apache Hive Tutorial, for grouping particular column values mentioned with the group by Query. Basically, we use Hive Group by Query with Multiple columns on Hive tables.

However, we need to know the syntax of HiveQL group by query to implement it. So, in this article, we will learn what is Hive Query – Group by Query, syntax, and an example of HiveQL Select Group By Clause to understand with JDBC Program.

What is Hive Query?

However, for the ETL purpose on top of Hadoop file system Hive offers SQL type querying language. Also, to work with tables, databases queries Hive Query language (HiveQL) offers SQL type environment in Hive.

In addition, to perform different type data manipulations and querying it is possible to have a different type of clauses associated with Hive. Especially, for better connectivity with different nodes outside the environment. Also, HIVE offers JDBC connectivity.

Also, there are several features of Hive queries offers. Such as:

  • For example data modeling. Basically, for the creation of databases, tables, etc.
  • Moreover, ETL functionalities. For example, Extraction, Transformation, and Loading data into tables.
  • Also, it offers joins to merge different data tables.
  • While it comes to ease of code, it offers user-specific custom scripts.
  • Also, it offers a faster-querying tool on top of Hadoop.

HiveQL Select

Basically, for grouping particular column values mentioned with the group by query, Group by clause use columns on Hive tables. However, column name does not matter, since for whatever the name we are defining a Group By query will selects and display results by grouping the particular column values.

i.  Group by Query Syntax

However,  see below the syntax of GROUP BY Clause:
SELECT [ALL | DISTINCT] select_expr, select_expr, …
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];

ii. Group by Query Example

Also, to understand well, see an example below. Although, let’s suppose an employee table. Basically, it includes Id, Name, Salary, Designation, and Dept fields. However, to retrieve the number of employees in each department Generate a query.
Table 1- Group By Clause Example

IDNameSalaryDesignationDept
1201Ross45000Tech managerTP
1202Rachel45000ProofreaderPR
1203Monika40000Technical writerTP
1204Mike45000ProofreaderPR
1205Joey30000OP AdminAdmin

Moreover, BY using the above scenario, the following query retrieves the employee details.
hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

Also, we get to see the following response to the successful execution of the query:

Table 2 – Group By Query 

Dept Count(*)
Admin1
PR2
TP3

iii. JDBC Program

Moreover, to apply the Group By clause for the given example, here is the JDBC program is given below.
For example,

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveQLGroupBy
      {
         private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";  
         public static void main(String[] args) throws SQLException
              {
// Register driver and create driver instance
                   Class.forName(driverName);     
// get connection
                   Connection con = DriverManager.
                   getConnection("jdbc:hive://localhost:10000/userdb", "", "");     
// create statement
                   Statement stmt = con.createStatement();     
// execute statement
                   Resultset res = stmt.executeQuery(“SELECT Dept,count(*) ” + “FROM employee GROUP BY DEPT; ”);
                   System.out.println(" Dept \t count(*)");     
                   while (res.next())
                     {
                             System.out.println(res.getString(1) + " " + res.getInt(2));
                     }
                  con.close();
            }
     }

Also, use the following commands to compile and execute this program. Moreover, save the program in a file named HiveQLGroupBy.java.

$ javac HiveQLGroupBy.java
$ java HiveQLGroupBy

iv. Group By Clause – Output

However, here is the possible output of Group By Query.
Table 3 – Group By Query 

Dept Count(*)
Admin1
PR2
TP3

So, this was all about Apache HiveQL Select – Group By Query Tutorial. Hope you like our explanation of Hive Group by Clause.

Conclusion

As a result, we have seen the whole concept of HiveQL Select -Group By query in Apache Hive, with a group by query example & syntax, we also discuss JDBC program with its output to understand HiveQL – Group By clause well.

In our next tutorial, we will study hive Oder By Query in detail. Still, if you have any query, feel free to ask in the comment section.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

courses

DataFlair Team

DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.

Leave a Reply

Your email address will not be published. Required fields are marked *