Impala TRUNCATE TABLE Statement with Practical Example

1. Objective

As we know to delete the table completely we use Impala Drop Table Statement. However, we just need to delete some data from an Impala table while leaving the table itself. For that purpose, we use the Impala TRUNCATE TABLE Statement.
So, in this article, we will learn the whole concept of Impala TRUNCATE TABLE Statement. Apart from its introduction, it includes its syntax, type as well as its example, to understand it well.

So, let’s start Impala Truncate Table Statements.

Impala TRUNCATE TABLE Statement with Practical Example

Impala TRUNCATE TABLE Statement with Practical Example

2. How to use Impala TRUNCATE TABLE Statement

Basically, to delete some data from an Impala table while leaving the table itself we use Impala TRUNCATE TABLE Statement. In other words, to remove all the records from an existing table we use the Truncate Table Statement of Impala.
So, there is one more command we can use DROP TABLE command, but it would remove the complete table structure from the database. Then we would require re-creating this table once again if we wish you store some data.
Let’s See How to use Impala CREATE TABLE Statement

a. Syntax of Impala Truncate Table Statements

So, the syntax for using Impala TRUNCATE TABLE Statement is-

TRUNCATE TABLE [IF EXISTS] [db_name.]table_name

b. Statement type

Impala TRUNCATE TABLE Statement is of DDL Type.

c. Usage

After the data has been copied to another table for the next stage of processing often used to empty tables that are used during ETL cycles. In addition, to dropping and recreating the table, or using INSERT OVERWRITE to replace the data during the next ETL cycle, this statement is a low-overhead alternative.
Makes sure it deletes all the data and associated data files in the table. So, we can use it to delete data files from internal tables, external tables, partitioned tables, and tables mapped to HBase or the Amazon Simple Storage Service (S3). Well, including all partitions of a partitioned table, the data removal applies to the entire table.
Let’s Discuss Impala Use Cases and Applications in detail
However, when the data is removed any statistics produced by the COMPUTE STATS statement are reset.

Although, before truncating a table ensure that you are in the correct database. Either by using a fully qualified name db_name.table_name or by issuing a USE statement first.
There is an optional IF EXISTS clause. It is available in CDH 5.7 / Impala 2.5 and higher. It makes the statement succeed even if the table exists or not. There are two possible conditions,  Either the table does exist or not exist. So, if it does exist, it is truncated; or if it does not exist the statement has no effect. To be more specific, in standardized setup scripts that is might be run both before and after some of the tables exist, this is very useful.
Moreover, if we use this statement to remove any HDFS data files, it goes into the  HDFS trashcan. Further, if this operation turns out to be a mistake, we can recover them within a defined time interval.

d. Cancellation

It is not possible to cancel it. That implies it Cannot be canceled.

Get the most demanding skills of IT Industry - Learn Hadoop

3. Amazon S3 Considerations

Here, the TRUNCATE TABLE statement can remove data files from S3, but Impala cannot write new data to a table stored in the Amazon S3 filesystem.
Read about Impala ALTER VIEW Statement

4. HDFS Permissions

It is must that the Impala user ID has to write permission for all the files and directories those make up the table that the Impalad daemon runs under.

Hadoop Quiz

5. Examples of Truncate Table in Impala

Here, is a table containing some data and with table and column statistics. Now, the data is removed and the statistics are reset after the TRUNCATE TABLE statement.
For Example,

CREATE TABLE truncate_demo (x INT);
INSERT INTO truncate_demo VALUES (1), (2), (4), (8);
SELECT COUNT(*) FROM truncate_demo;
COMPUTE STATS truncate_demo;
Updated 1 partition(s) and 1 column(s).
SHOW TABLE STATS truncate_demo;
#Rows #Files sizeBytes cached Cache Replication FormatIncremental Stats
SHOW COLUMN STATS truncate_demo;
columnType#Distinct Values#NullsMax SizeAvg Size
-- After this statement, the data and the table/column stats will be gone.
TRUNCATE TABLE truncate_demo;
SELECT COUNT(*) FROM truncate_demo;
SHOW TABLE STATS truncate_demo;
#Rows#Files Size Bytes Cached Cache ReplicationFormatIncremental stats
SHOW COLUMN STATS truncate_demo;
Column Type #Nulls#Distinct ValuesMax SizeAvg Size

Have a look at Impala Troubleshooting-Performance Tuning
Further, it shows how the IF EXISTS clause allows the TRUNCATE TABLE statement to be run without error whether or not the table exists:
For Example,

CREATE TABLE staging_table1 (x INT, s STRING);
Fetched 0 row(s) in 0.33s
SHOW TABLES LIKE 'staging*';
Fetched 1 row(s) in 0.25s
-- Our ETL process involves removing all data from several staging tables
-- even though some might be already dropped, or not created yet.
Fetched 0 row(s) in 5.04s
Fetched 0 row(s) in 0.25s
Fetched 0 row(s) in 0.25s

So, this was all in Impala Truncate Table Statements. Hope you like our explanation.

6. Conclusion

As a result, we have seen the whole concept of Impala TRUNCATE TABLE Statement. Still, if any doubt occurs in how to use Impala TRUNCATE TABLE Statement, feel free to ask in the comment section.
Related Topic- Impala Comments
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *