Impala Alter Table Statement – HDFS Caching and Permissions

Boost your career with Free Big Data Courses!!

Basically, to change the structure or properties of an existing Impala table we use Impala Alter Table Statement. There are much more to know about Impala Alter Table Statement. Like Altering the name of a table,

Adding columns to a table, Dropping columns from a table, Changing the name and type of a column or Altering a Table using Hue. So, in this article on Impala Alter Table Statement, we will discuss all of them. Also, we will include syntax and example to understand it well.

Introduction to Impala Alter Table Statement?

While we need to change the structure or properties of an existing Impala table, we use Impala Alter Table Statement.

Basically, there is a primarily a logical operation that updates the table metadata in the Metastore database that Impala shares with  Hive, in Impala. However, we can say most ALTER TABLE operations do not actually rewrite, move, and so on the actual data files.

Although, the one exception is The RENAME TO clause it can cause HDFS files to be moved to different paths.

Also, we typically need to perform corresponding physical filesystem operations, whenever we do an ALTER TABLE operation. Like rewriting the data files to include extra fields, or converting them to a different file format.

To be very specific, to perform changes on a given table, we use Impala Alter table statement. In addition, we can add, delete, or modify columns in an existing table and we can also rename it, using this statement.

Syntax of Impala Alter Table Statements

So, the syntax for using Impala Alter Table Statement is-

ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name { ADD [IF NOT EXISTS] | DROP [IF EXISTS] } PARTITION (partition_spec) [PURGE]
ALTER TABLE name RECOVER PARTITIONS
ALTER TABLE name [PARTITION (partition_spec)]
 SET { FILEFORMAT file_format
 | LOCATION 'hdfs_path_of_directory'
 | TBLPROPERTIES (table_properties)
 | SERDEPROPERTIES (serde_properties) }
ALTER TABLE name colname
 ('statsKey'='val, ...)
statsKey ::= numDVs | numNulls | avgSize | maxSize
ALTER TABLE name [PARTITION (partition_spec)] SET { CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED }
new_name ::= [new_database.]new_table_name
col_spec ::= col_name type_name
partition_spec ::= partition_col=constant_value
table_properties ::= 'name'='value'[, 'name'='value' ...]
serde_properties ::= 'name'='value'[, 'name'='value' ...]
file_format ::= { PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO }

Statement type

Impala Alter Table Statement is of DDL Type.

Usage of Impala Alter Table Statements

Let me now talk about the usage of Impala Alter Table Statement. Make sure to include all the partitioning columns in the specification, while we specify partitions in an ALTER TABLE statement, through the PARTITION (partition_spec) clause.

For both internal as well as External tables, most of the ALTER TABLE operations work the same. However, there is an exception,  the underlying data directory is not renamed or moved to an external table.

Examples – Impala Alter table Statement

Let’s suppose we have a table named Employees in Impala. Here,  is the data −

idnameageaddresssalary
1shubham32delhi20000
2monika25mumbai15000
3kajal27alirajpur40000
4revti25indore35000
5shreyash23pune30000
6mehul22hyderabad32000

Now, see the list of tables in the database my_db, where we  can find the Employees table

[quickstart.cloudera:21000] > show tables;

Query: show tables

name
Employees
Employees 1
Employees 2
Employees 3

i. Altering the name of a table

  • Syntax

ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name
  • Example

For Example,
So, we are changing the name of the table Employees to Students.

[quickstart.cloudera:21000] > ALTER TABLE my_db.Employees RENAME TO my_db.Students;

Displaying the following message,  Impala changes the name of the table as required just after executing the above query.

Query: alter TABLE my_db.Employees RENAME TO my_db.Students

Now, you will see the table named Students instead of Employees.
Query: show tables

name
Students
Employees 1
Employees 2
Employees 3

Fetched 4 row(s) in 0.10s

ii. Adding columns to a table

  • Syntax

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])

Example
For Example,
So,  we are adding two columns account_no and phone_number (both are of bigint data type) to the Students table.

[quickstart.cloudera:21000] > ALTER TABLE Students ADD COLUMNS (account_no BIGINT,
phone_no BIGINT);

Now, it will add the specified columns to the table named student, On executing the above query.

Query: alter TABLE Students ADD COLUMNS (account_no BIGINT, phone_no BIGINT)

We can find the newly added columns since we verify the schema of the table Students in it as shown below.

quickstart.cloudera:21000] > describe Students;
Query: describe Students
nametypecomment
idint
namestring
ageint
addressstring
salarybigint
account_nobigint
phone_nobigint

Fetched 7 row(s) in 0.20s

iii. Dropping columns from a table

  • Syntax

ALTER TABLE name DROP [COLUMN] column_name
  • Example

For Example,
So, we are deleting the column named account_no.

[quickstart.cloudera:21000] > ALTER TABLE Students DROP account_no;

On executing the above query, Impala deletes the column named account_no,

Query: alter TABLE Students DROP account_no

However, we cannot find the column named account_no since it was deleted since you verify the schema of the table Students.

[quickstart.cloudera:21000] > describe Students;
Query: describe Students
nametypecomment
idint
namestring
ageint
addressstring
salarybigint
phone_nobigint

Fetched 6 row(s) in 0.11s
Any doubt yet in Impala Alter Statement? Please Comment.

Iv. Changing the name and type of a column

  • Syntax

syntax is- ALTER TABLE name CHANGE column_name new_name new_type
  • Example

For Example,
So,  we are changing the name of the column phone_no to email and it’s data type to string.

[quickstart.cloudera:21000] > ALTER TABLE Students CHANGE phone_no e_mail string;

On executing the above query, Impala does the specified changes.

Query: alter TABLE Students CHANGE phone_no e_mail string

By using the describe statement, we can verify the metadata of the table Students. Further, see the required changes to the specified column has done.

[quickstart.cloudera:21000] > describe Students;
Query: describe Students
nametypecomment
idint
namestring
ageint
addressstring
salarybigint
e_mailstring

Fetched 6 row(s) in 0.11s

V. Altering a Table using Hue

There are several steps you have to follow while using Hue. Such as:

  • At first, type the alter statement in Impala Query editor. Then click on the execute button.
  • Now, it will change the name of the table Employees to Students, on executing the above query. Hence, we can execute all the alter queries, in the same way.

Complex Type Considerations

The ALTER TABLE statement can change the metadata for tables containing complex types, in CDH 5.5 / Impala 2.3 and higher. Such as  ARRAY, STRUCT, and MAP. Basically, to modify the table layout for complex type columns we can use an ADD COLUMNS, DROP COLUMN, or CHANGE clause.

Amazon S3 considerations

To make Impala query data from the Amazon S3 filesystem, we can specify an s3a:// prefix on the LOCATION attribute of a table or partition.

However, whenever we issue ALTER TABLE statements with the ADD PARTITION or DROP PARTITION clauses,  Impala automatically handles creating or removing the associated folders, in CDH 5.8 / Impala 2.6 and higher.

In addition, Impala DDL statements can create or remove folders as needed in the Amazon S3 system in CDH 5.8 / Impala 2.6 and higher. Such as CREATE DATABASE, CREATE TABLE, DROP DATABASE CASCADE, DROP TABLE, and ALTER TABLE [ADD|DROP] PARTITION.

Although, we had to create folders yourself and point Impala database, tables, or partitions at them, and manually remove folders when no longer needed before CDH 5.8 / Impala 2.6.

HDFS caching (CACHED IN clause)

However, any existing or future data files in the table directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching mechanism, since we specify the CACHED IN clause.

There is an optional WITH REPLICATION clause for CREATE TABLE and ALTER TABLE,  in Impala 2.2 / CDH 5.4 and higher. Basically, it lets you specify a replication factor, the number of hosts on which to cache the same data blocks.

Moreover, Impala randomly selects a host that has a cached copy of that data block, when Impala processes a cached data block, also where the cache replication factor is greater than 1.

Basically, we can enable the SYNC_DDL query option to make each DDL statement wait before returning If you connect to different Impala nodes within an impala-shell session for load-balancing purposes until the new or changed metadata has been received by all the Impala nodes.

HDFS permissions

However, Many of the ALTER TABLE clauses do not actually read or write any HDFS files. Hence, they do not depend on specific HDFS permissions.

Cancellation

It is not possible to cancel it. That implies it Cannot be canceled.
This was all in the Impala Alter Table Statement.

Conclusion – Impala Alter Table Statement

As a result, we have seen the whole concept of Impala Alter Table Statement. Still, if any doubt occurs in how to create impala database, feel free to ask in the comment section.

Your opinion matters
Please write your valuable feedback about DataFlair on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *