Impala TRUNCATE TABLE Statement with Practical Example

Boost your career with Free Big Data Courses!!

As we know to delete the table completely we use Impala Drop Table Statement. However, we just need to delete some data from an Impala table while leaving the table itself. For that purpose, we use the Impala TRUNCATE TABLE Statement.

So, in this article, we will learn the whole concept of Impala TRUNCATE TABLE Statement. Apart from its introduction, it includes its syntax, type as well as its example, to understand it well.

So, let’s start Impala Truncate Table Statements.

How to use Impala TRUNCATE TABLE Statement

Basically, to delete some data from an Impala table while leaving the table itself we use Impala TRUNCATE TABLE Statement. In other words, to remove all the records from an existing table we use the Truncate Table Statement of Impala.

So, there is one more command we can use DROP TABLE command, but it would remove the complete table structure from the database. Then we would require re-creating this table once again if we wish you store some data.

a. Syntax of Impala Truncate Table Statements

So, the syntax for using Impala TRUNCATE TABLE Statement is-

TRUNCATE TABLE [IF EXISTS] [db_name.]table_name

b. Statement type

Impala TRUNCATE TABLE Statement is of DDL Type.

c. Usage

After the data has been copied to another table for the next stage of processing often used to empty tables that are used during ETL cycles.

In addition, to dropping and recreating the table, or using INSERT OVERWRITE to replace the data during the next ETL cycle, this statement is a low-overhead alternative.

Makes sure it deletes all the data and associated data files in the table. So, we can use it to delete data files from internal tables, external tables, partitioned tables, and tables mapped to HBase or the Amazon Simple Storage Service (S3).

Well, including all partitions of a partitioned table, the data removal applies to the entire table.

However, when the data is removed any statistics produced by the COMPUTE STATS statement are reset.

Although, before truncating a table ensure that you are in the correct database. Either by using a fully qualified name db_name.table_name or by issuing a USE statement first.

There is an optional IF EXISTS clause. It is available in CDH 5.7 / Impala 2.5 and higher. It makes the statement succeed even if the table exists or not. There are two possible conditions,  Either the table does exist or not exist.

So, if it does exist, it is truncated; or if it does not exist the statement has no effect. To be more specific, in standardized setup scripts that is might be run both before and after some of the tables exist, this is very useful.

Moreover, if we use this statement to remove any HDFS data files, it goes into the  HDFS trashcan. Further, if this operation turns out to be a mistake, we can recover them within a defined time interval.

d. Cancellation

It is not possible to cancel it. That implies it Cannot be canceled.

Amazon S3 Considerations

Here, the TRUNCATE TABLE statement can remove data files from S3, but Impala cannot write new data to a table stored in the Amazon S3 filesystem.

HDFS Permissions

It is must that the Impala user ID has to write permission for all the files and directories those make up the table that the Impalad daemon runs under.

Examples of Truncate Table in Impala

Here, is a table containing some data and with table and column statistics. Now, the data is removed and the statistics are reset after the TRUNCATE TABLE statement.
For Example,

CREATE TABLE truncate_demo (x INT);
INSERT INTO truncate_demo VALUES (1), (2), (4), (8);
SELECT COUNT(*) FROM truncate_demo;
count(*)
4
COMPUTE STATS truncate_demo;
summary              
Updated 1 partition(s) and 1 column(s).
SHOW TABLE STATS truncate_demo;
#Rows #Files sizeBytes cached Cache Replication FormatIncremental Stats
418BNOT CACHEDNOT CACHEDTEXTfalse
SHOW COLUMN STATS truncate_demo;
columnType#Distinct Values#NullsMax SizeAvg Size
xINT4-144
-- After this statement, the data and the table/column stats will be gone.
TRUNCATE TABLE truncate_demo;
SELECT COUNT(*) FROM truncate_demo;
count(*)
0
SHOW TABLE STATS truncate_demo;
#Rows#Files Size Bytes Cached Cache ReplicationFormatIncremental stats
-100BNOT CACHEDNOT CACHEDTEXTfalse 
SHOW COLUMN STATS truncate_demo;
Column Type #Nulls#Distinct ValuesMax SizeAvg Size
XINT-1-144

Further, it shows how the IF EXISTS clause allows the TRUNCATE TABLE statement to be run without error whether or not the table exists:
For Example,

CREATE TABLE staging_table1 (x INT, s STRING);
Fetched 0 row(s) in 0.33s
SHOW TABLES LIKE 'staging*';
NAME
staging_table1
Fetched 1 row(s) in 0.25s
-- Our ETL process involves removing all data from several staging tables
-- even though some might be already dropped, or not created yet.
TRUNCATE TABLE IF EXISTS staging_table1;
Fetched 0 row(s) in 5.04s
TRUNCATE TABLE IF EXISTS staging_table2;
Fetched 0 row(s) in 0.25s
TRUNCATE TABLE IF EXISTS staging_table3;
Fetched 0 row(s) in 0.25s

So, this was all in Impala Truncate Table Statements. Hope you like our explanation.

Conclusion

As a result, we have seen the whole concept of Impala TRUNCATE TABLE Statement. Still, if any doubt occurs in how to use Impala TRUNCATE TABLE Statement, feel free to ask in the comment section.

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *