Impala LIMIT Clause – Syntax, Usage, Restrictions, Example

Boost your career with Free Big Data Courses!!

In our previous tutorial, we discussed Impala HAVING Clause. Today, we are going to study the Impala Limit Clause, it helps to restrict the number of rows of a resultset to the desired number, in Impala. Also, we will cover its syntax, usage as well as Example to understand it well.

So, let’s Impala Limit Clause.

What is Impala LIMIT Clause?

Basically, in a SELECT query, this clause sets a maximum number of rows for the result set. However, while we pre-select the maximum size of the result set,  that helps Impala to optimize memory usage during processing of a distributed query.

In other words, to restrict the number of rows of a resultset to the desired number the limit clause in Impala. It also specifies that the resultset of the query does not hold the records beyond the specified limit.

a. Syntax

So, the syntax for using Impala LIMIT Clause is-

LIMIT constant_integer_expression

Also, make sure that the argument to the LIMIT clause must evaluate to a constant value. However,  It can be a numeric literal or another kind of numeric expression. That may involve operators, casts, and function return values. But still, we cannot refer to a column or use a subquery.

b. Usage

There are several contexts in which this clause is very useful. Such as:

  • While we want to return exactly N items from a top-N query, we use this clause. Like the 10 highest-rated items in a shopping category or the 50 hostnames that refer the most traffic to a website.
  • Also, when we want to demonstrate some sample values from a table or a particular query we use this clause.
  • In order to keep queries from returning huge result sets by accident,  we use this clause. Especially, if a table is larger than expected, or a WHERE clause matches more rows than expected.

However, make sure the value for the LIMIT clause is a numeric literal. Although it can be a numeric expression but only in Impala 1.2.1 and higher.

Before Impala 1.4.0, we also needed any query including an ORDER BY clause to use a LIMIT clause. Also, the LIMIT clause is optional for ORDER BY queries, in Impala 1.4.0 and higher.

Though, there are some cases where sorting a huge result set requires enough memory, in order to exceed the Impala memory limit for a particular node. At that time Impala automatically uses a temporary disk work area to perform the sort operation.

Also, we get a provision that we can combine a LIMIT clause with an OFFSET clause to produce a small result set that is different from a top-N query.Let us understand this with an example. Here if we want to return items 11 through 20, to simulate “paged” results we can use this technique.

c. Restrictions

However, there are some restrictions like Correlated subqueries used in EXISTS and IN operators cannot include a LIMIT clause.

Example of LIMIT Clause in Impala

Let us suppose we have a table named Employees in the database my_db. Its contents are −

[quickstart.cloudera:21000] > select * from Employees;
Query: select * from Employees
idnameageaddresssalary
3kajal27alirajpur40000
6mehul22hyderabad32000
1shubham32delhi20000
5shreyash23pune30000
2monika25mumbai15000
8Rishabh22chennai31000
9Pulkit23Gandhi nagar28000
7Vaishnavi25Goa23000
4revti25indore35000

Fetched 9 row(s) in 0.51s
Moreover,  by using the order by clause we can arrange the records in the table in the ascending order of their id’s.

[quickstart.cloudera:21000] > select * from Employees order by id;
Query: select * from Employees order by id
idnameageaddresssalary
1shubham32delhi20000
2monika25mumbai15000
3kajal27alirajpur40000
4revti25indore35000
5shreyash23pune30000
6mehul22hyderabad32000
7Vaishnavi25Goa23000
8Rishabh22chennai31000
9Pulkit23Gandhi nagar28000

Fetched 9 row(s) in 0.54s
Further,  we can restrict the number of records of the output to 4, using the limit clause.

[quickstart.cloudera:21000] > select * from Employees order by id limit 4;
So, we will get the following output, on executing, the above query.

Query: select * from Employees order by id limit 4

idnameageaddresssalary
1shubham32delhi20000
2monika25mumbai15000
3kajal27alirajpur40000
4revti25indore35000

Fetched 4 row(s) in 0.64s
So, this was all about Impala Limit Clause. Hope you like our explanation.

Conclusion

As a result, we have seen the whole concept of Impala LIMIT Clause. Still, if any doubt occurs in how to restrict no. of rows with Impala LIMIT Clause, feel free to ask in the comment section.

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

follow dataflair on YouTube

1 Response

  1. Nishant Amin says:

    I have a table in impala with over 300K records. How do I write SQL in impala to split in 1st 100k, 2nd 100k and rest please?

Leave a Reply

Your email address will not be published. Required fields are marked *