Impala DISTINCT Operator – Syntax & Example
In our previous tutorial, we studied the Impala Offset Clause. Now, we are going to study Impala DISTINCT Operator. Although, there is much more to learn about using Impala DISTINCT Operator.
So, let’s start Impala DISTINCT Operator.
If these professionals can make a switch to Big Data, so can you:
2. What is Impala DISTINCT Operator?
While we want to filter the results or to remove duplicates, we use The DISTINCT operator in a SELECT statement:
— Returns the unique values from one column.
— NULL is included in the set of values if any rows have a NULL in this column.
select distinct c_birth_country from Employees;
— Returns the unique combinations of values from multiple columns.
select distinct c_salutation, c_last_name from Employees;
Moreover, to find how many different values a column contains, we can use DISTINCT in combination with an aggregation function.Typically COUNT():
— Counts the unique values from one column.
— NULL is not included as a distinct value in the count.
select count(distinct c_birth_country) from Employees;
— Counts the unique combinations of values from multiple columns.
select count(distinct c_salutation, c_last_name) from Employees;
However, make sure that using DISTINCT in more than one aggregation function in the same query is not supported by Impala SQL. To understand more, we could not have a single query with both COUNT(DISTINCT c_first_name) and COUNT(DISTINCT c_last_name) in the SELECT list.
3. Syntax for Distinct Operator in Impala
So, the syntax for using Impala DISTINCT Operator is-
select distinct columns… from table_name;
4. Zero-length strings
Impala considers zero-length strings (“”), NULL for purposes of clauses such as DISTINCT and GROUP BY. Also, space to all is different values.
As we discussed earlier, Impala only allows a single COUNT(DISTINCT columns) expression in each query.
Moreover, we can produce an estimate of the distinct values for a column by specifying NDV(column), if we do not need precise accuracy. Here, a query can contain multiple instances of NDV(column). Also, can enable the APPX_COUNT_DISTINCT query option, to make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV().
Further, we can use the following technique for queries involving a single table in order to produce the same result as multiple COUNT(DISTINCT) expressions:
select v1.c1 result1, v2.c1 result2 from (select count(distinct col1) as c1 from t1) v1 cross join (select count(distinct col2) as c1 from t1) v2;
By the way, prefer to use the NDV() technique while practical because CROSS JOIN is an expensive operation.
Make sure, Impala does not do any ordering of DISTINCT values in contrast with some database systems. Basically, that always return DISTINCT values in sorted order. So, to avoid that include an ORDER BY clause always, if we need the values in alphabetical or numeric sorted order.
5. Impala Distinct Operator – Example
Let’s discuss Impala DISTINCT Operator Example,
So, let us suppose we have a table named Employees in the database my_db. Its contents are −
[quickstart.cloudera:21000] > select distinct name, age, address from Employees; Query: select distinct name, age, address from Employees
However, we can see that the name of the Employees Shubham and Revti entered twice. So, now we can select the unique values using the Impala Distinct Operator.
[quickstart.cloudera:21000] > select distinct name, age, address from Employees; On executing, the above query gives the following output. Query: select distinct name from Employees
Fetched 9 row(s) in 1.46s
So, this was all about Impala DISTINCT Operator. Hope you like our explanation.
Hence, in this article, we have seen how to use Impala DISTINCT Operator properly. However, if you want to ask any doubt, feel free to ask in the comment section.we will definitely respond.
Related Topic- Top 50 Impala Interview Questions and Answers