Impala LIMIT Clause – Syntax, Usage, Restrictions, Example
In our previous tutorial, we discussed Impala HAVING Clause. Today, we are going to study the Impala Limit Clause, it helps to restrict the number of rows of a resultset to the desired number, in Impala. Also, we will cover its syntax, usage as well as Example to understand it well.
So, let’s Impala Limit Clause.
What is Impala LIMIT Clause?
Basically, in a SELECT query, this clause sets a maximum number of rows for the result set. However, while we pre-select the maximum size of the result set, Â that helps Impala to optimize memory usage during processing of a distributed query.
In other words, to restrict the number of rows of a resultset to the desired number the limit clause in Impala. It also specifies that the resultset of the query does not hold the records beyond the specified limit.
a. Syntax
So, the syntax for using Impala LIMIT Clause is-
LIMIT constant_integer_expression
Also, make sure that the argument to the LIMIT clause must evaluate to a constant value. However, Â It can be a numeric literal or another kind of numeric expression. That may involve operators, casts, and function return values. But still, we cannot refer to a column or use a subquery.
b. Usage
There are several contexts in which this clause is very useful. Such as:
- While we want to return exactly N items from a top-N query, we use this clause. Like the 10 highest-rated items in a shopping category or the 50 hostnames that refer the most traffic to a website.
- Also, when we want to demonstrate some sample values from a table or a particular query we use this clause.
- In order to keep queries from returning huge result sets by accident, Â we use this clause. Especially, if a table is larger than expected, or a WHERE clause matches more rows than expected.
However, make sure the value for the LIMIT clause is a numeric literal. Although it can be a numeric expression but only in Impala 1.2.1 and higher.
Before Impala 1.4.0, we also needed any query including an ORDER BY clause to use a LIMIT clause. Also, the LIMIT clause is optional for ORDER BY queries, in Impala 1.4.0 and higher.
Though, there are some cases where sorting a huge result set requires enough memory, in order to exceed the Impala memory limit for a particular node. At that time Impala automatically uses a temporary disk work area to perform the sort operation.
Also, we get a provision that we can combine a LIMIT clause with an OFFSET clause to produce a small result set that is different from a top-N query.Let us understand this with an example. Here if we want to return items 11 through 20, to simulate “paged” results we can use this technique.
c. Restrictions
However, there are some restrictions like Correlated subqueries used in EXISTS and IN operators cannot include a LIMIT clause.
Example of LIMIT Clause in Impala
Let us suppose we have a table named Employees in the database my_db. Its contents are −
[quickstart.cloudera:21000] > select * from Employees; Query: select * from Employees
id | name | age | address | salary |
3 | kajal | 27 | alirajpur | 40000 |
6 | mehul | 22 | hyderabad | 32000 |
1 | shubham | 32 | delhi | 20000 |
5 | shreyash | 23 | pune | 30000 |
2 | monika | 25 | mumbai | 15000 |
8 | Rishabh | 22 | chennai | 31000 |
9 | Pulkit | 23 | Gandhi nagar | 28000 |
7 | Vaishnavi | 25 | Goa | 23000 |
4 | revti | 25 | indore | 35000 |
Fetched 9 row(s) in 0.51s
Moreover,  by using the order by clause we can arrange the records in the table in the ascending order of their id’s.
[quickstart.cloudera:21000] > select * from Employees order by id; Query: select * from Employees order by id
id | name | age | address | salary |
1 | shubham | 32 | delhi | 20000 |
2 | monika | 25 | mumbai | 15000 |
3 | kajal | 27 | alirajpur | 40000 |
4 | revti | 25 | indore | 35000 |
5 | shreyash | 23 | pune | 30000 |
6 | mehul | 22 | hyderabad | 32000 |
7 | Vaishnavi | 25 | Goa | 23000 |
8 | Rishabh | 22 | chennai | 31000 |
9 | Pulkit | 23 | Gandhi nagar | 28000 |
Fetched 9 row(s) in 0.54s
Further, Â we can restrict the number of records of the output to 4, using the limit clause.
[quickstart.cloudera:21000] > select * from Employees order by id limit 4;
So, we will get the following output, on executing, the above query.
Query: select * from Employees order by id limit 4
id | name | age | address | salary |
1 | shubham | 32 | delhi | 20000 |
2 | monika | 25 | mumbai | 15000 |
3 | kajal | 27 | alirajpur | 40000 |
4 | revti | 25 | indore | 35000 |
Fetched 4 row(s) in 0.64s
So, this was all about Impala Limit Clause. Hope you like our explanation.
Conclusion
As a result, we have seen the whole concept of Impala LIMIT Clause. Still, if any doubt occurs in how to restrict no. of rows with Impala LIMIT Clause, feel free to ask in the comment section.
If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google
I have a table in impala with over 300K records. How do I write SQL in impala to split in 1st 100k, 2nd 100k and rest please?