Impala DISTINCT Operator – Syntax & Example

Boost your career with Data Engineering Courses!!

In our previous tutorial, we studied the Impala Offset Clause. Now, we are going to study Impala DISTINCT Operator. Although, there is much more to learn about using Impala DISTINCT Operator.
So, let’s start Impala DISTINCT Operator.

What is Impala DISTINCT Operator?

While we want to filter the results or to remove duplicates, we use The DISTINCT operator in a SELECT statement:

— Returns the unique values from one column.
— NULL is included in the set of values if any rows have a NULL in this column.
select distinct c_birth_country from Employees;
— Returns the unique combinations of values from multiple columns.
select distinct c_salutation, c_last_name from Employees;

Moreover, to find how many different values a column contains, we can use DISTINCT in combination with an aggregation function.Typically COUNT():

— Counts the unique values from one column.
— NULL is not included as a distinct value in the count.

select count(distinct c_birth_country) from Employees;

— Counts the unique combinations of values from multiple columns.

select count(distinct c_salutation, c_last_name) from Employees;

However, make sure that using DISTINCT in more than one aggregation function in the same query is not supported by Impala SQL. To understand more, we could not have a single query with both COUNT(DISTINCT c_first_name) and COUNT(DISTINCT c_last_name) in the SELECT list.

Syntax for Distinct Operator in Impala

So, the syntax for using Impala DISTINCT Operator is-

select distinct columns… from table_name;

Zero-length strings

Impala considers zero-length strings (“”), NULL for purposes of clauses such as DISTINCT and GROUP BY. Also, space to all is different values.

Note:
As we discussed earlier, Impala only allows a single COUNT(DISTINCT columns) expression in each query.

Moreover, we can produce an estimate of the distinct values for a column by specifying NDV(column), if we do not need precise accuracy. Here, a query can contain multiple instances of NDV(column).

Also, can enable the APPX_COUNT_DISTINCT query option, to make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV().

Further, we can use the following technique for queries involving a single table in order to produce the same result as multiple COUNT(DISTINCT) expressions:

select v1.c1 result1, v2.c1 result2 from
 (select count(distinct col1) as c1 from t1) v1
   cross join
 (select count(distinct col2) as c1 from t1) v2;

By the way, prefer to use the NDV() technique while practical because CROSS JOIN is an expensive operation.

Note:
Make sure, Impala does not do any ordering of DISTINCT values in contrast with some database systems. Basically, that always return DISTINCT values in sorted order. So, to avoid that include an ORDER BY clause always, if we need the values in alphabetical or numeric sorted order.

Impala Distinct Operator – Example

Let’s discuss Impala DISTINCT Operator Example,

So, let us suppose we have a table named Employees in the database my_db. Its contents are −

[quickstart.cloudera:21000] > select distinct name, age, address from Employees;
Query: select distinct  name, age, address from Employees

Name	Age	Address
Shubham	32	Delhi
Shubham	32	Delhi
Monika	25	Mumbai
Kajal	27	Alirajpur
Revti	25	Indore
Revti	25	Indore
Shreyash	23	Pune
Mehul	22	Hyderabad

However, we can see that the name of the Employees Shubham and Revti entered twice. So, now we can select the unique values using the Impala Distinct Operator.

[quickstart.cloudera:21000] > select distinct name, age, address from Employees;
On executing, the above query gives the following output.
Query: select distinct  name from Employees

Name	Age	Address
Shubham	32	Delhi
Monika	25	Mumbai
Kajal	27	Alirajpur
Revti	25	Indore
Shreyash	23	Pune
Mehul	22	Hyderabad

Fetched 9 row(s) in 1.46s
So, this was all about Impala DISTINCT Operator. Hope you like our explanation.

Conclusion

Hence, in this article, we have seen how to use Impala DISTINCT Operator properly. However, if you want to ask any doubt, feel free to ask in the comment section. we will definitely respond.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

Impala DISTINCT Operator – Syntax & Example

What is Impala DISTINCT Operator?

Syntax for Distinct Operator in Impala

Zero-length strings

Impala Distinct Operator – Example

Conclusion

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials