Impala Alter Table Statement – HDFS Caching and Permissions
Basically, to change the structure or properties of an existing Impala table we use Impala Alter Table Statement. There are much more to know about Impala Alter Table Statement. Like Altering the name of a table,
Adding columns to a table, Dropping columns from a table, Changing the name and type of a column or Altering a Table using Hue. So, in this article on Impala Alter Table Statement, we will discuss all of them. Also, we will include syntax and example to understand it well.
Introduction to Impala Alter Table Statement?
While we need to change the structure or properties of an existing Impala table, we use Impala Alter Table Statement.
Basically, there is a primarily a logical operation that updates the table metadata in the Metastore database that Impala shares with  Hive, in Impala. However, we can say most ALTER TABLE operations do not actually rewrite, move, and so on the actual data files.
Although, the one exception is The RENAME TO clause it can cause HDFS files to be moved to different paths.
Also, we typically need to perform corresponding physical filesystem operations, whenever we do an ALTER TABLE operation. Like rewriting the data files to include extra fields, or converting them to a different file format.
To be very specific, to perform changes on a given table, we use Impala Alter table statement. In addition, we can add, delete, or modify columns in an existing table and we can also rename it, using this statement.
Syntax of Impala Alter Table Statements
So, the syntax for using Impala Alter Table Statement is-
ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...]) ALTER TABLE name { ADD [IF NOT EXISTS] | DROP [IF EXISTS] } PARTITION (partition_spec) [PURGE] ALTER TABLE name RECOVER PARTITIONS ALTER TABLE name [PARTITION (partition_spec)] SET { FILEFORMAT file_format | LOCATION 'hdfs_path_of_directory' | TBLPROPERTIES (table_properties) | SERDEPROPERTIES (serde_properties) } ALTER TABLE name colname ('statsKey'='val, ...) statsKey ::= numDVs | numNulls | avgSize | maxSize ALTER TABLE name [PARTITION (partition_spec)] SET { CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED } new_name ::= [new_database.]new_table_name col_spec ::= col_name type_name partition_spec ::= partition_col=constant_value table_properties ::= 'name'='value'[, 'name'='value' ...] serde_properties ::= 'name'='value'[, 'name'='value' ...] file_format ::= { PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO }
Statement type
Impala Alter Table Statement is of DDL Type.
Usage of Impala Alter Table Statements
Let me now talk about the usage of Impala Alter Table Statement. Make sure to include all the partitioning columns in the specification, while we specify partitions in an ALTER TABLE statement, through the PARTITION (partition_spec) clause.
For both internal as well as External tables, most of the ALTER TABLE operations work the same. However, there is an exception, Â the underlying data directory is not renamed or moved to an external table.
Examples – Impala Alter table Statement
Let’s suppose we have a table named Employees in Impala. Here,  is the data −
id | name | age | address | salary |
1 | shubham | 32 | delhi | 20000 |
2 | monika | 25 | mumbai | 15000 |
3 | kajal | 27 | alirajpur | 40000 |
4 | revti | 25 | indore | 35000 |
5 | shreyash | 23 | pune | 30000 |
6 | mehul | 22 | hyderabad | 32000 |
Now, see the list of tables in the database my_db, where we  can find the Employees table
[quickstart.cloudera:21000] > show tables;
Query: show tables
name |
Employees |
Employees 1 |
Employees 2 |
Employees 3 |
i. Altering the name of a table
Syntax
ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name
Example
For Example,
So, we are changing the name of the table Employees to Students.
[quickstart.cloudera:21000] > ALTER TABLE my_db.Employees RENAME TO my_db.Students;
Displaying the following message, Â Impala changes the name of the table as required just after executing the above query.
Query: alter TABLE my_db.Employees RENAME TO my_db.Students
Now, you will see the table named Students instead of Employees.
Query: show tables
name |
Students |
Employees 1 |
Employees 2 |
Employees 3 |
Fetched 4 row(s) in 0.10s
ii. Adding columns to a table
Syntax
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
Example
For Example,
So, Â we are adding two columns account_no and phone_number (both are of bigint data type) to the Students table.
[quickstart.cloudera:21000] > ALTER TABLE Students ADD COLUMNS (account_no BIGINT, phone_no BIGINT);
Now, it will add the specified columns to the table named student, On executing the above query.
Query: alter TABLE Students ADD COLUMNS (account_no BIGINT, phone_no BIGINT)
We can find the newly added columns since we verify the schema of the table Students in it as shown below.
quickstart.cloudera:21000] > describe Students; Query: describe Students
name | type | comment |
id | int | |
name | string | |
age | int | |
address | string | |
salary | bigint | |
account_no | bigint | |
phone_no | bigint |
Fetched 7 row(s) in 0.20s
iii. Dropping columns from a table
Syntax
ALTER TABLE name DROP [COLUMN] column_name
Example
For Example,
So, we are deleting the column named account_no.
[quickstart.cloudera:21000] > ALTER TABLE Students DROP account_no;
On executing the above query, Impala deletes the column named account_no,
Query: alter TABLE Students DROP account_no
However, we cannot find the column named account_no since it was deleted since you verify the schema of the table Students.
[quickstart.cloudera:21000] > describe Students; Query: describe Students
name | type | comment |
id | int | |
name | string | |
age | int | |
address | string | |
salary | bigint | |
phone_no | bigint |
Fetched 6 row(s) in 0.11s
Any doubt yet in Impala Alter Statement? Please Comment.
Iv. Changing the name and type of a column
Syntax
syntax is- ALTER TABLE name CHANGE column_name new_name new_type
Example
For Example,
So, Â we are changing the name of the column phone_no to email and it’s data type to string.
[quickstart.cloudera:21000] > ALTER TABLE Students CHANGE phone_no e_mail string;
On executing the above query, Impala does the specified changes.
Query: alter TABLE Students CHANGE phone_no e_mail string
By using the describe statement, we can verify the metadata of the table Students. Further, see the required changes to the specified column has done.
[quickstart.cloudera:21000] > describe Students; Query: describe Students
name | type | comment |
id | int | |
name | string | |
age | int | |
address | string | |
salary | bigint | |
e_mail | string |
Fetched 6 row(s) in 0.11s
V. Altering a Table using Hue
There are several steps you have to follow while using Hue. Such as:
- At first, type the alter statement in Impala Query editor. Then click on the execute button.
- Now, it will change the name of the table Employees to Students, on executing the above query. Hence, we can execute all the alter queries, in the same way.
Complex Type Considerations
The ALTER TABLE statement can change the metadata for tables containing complex types, in CDH 5.5 / Impala 2.3 and higher. Such as ARRAY, STRUCT, and MAP. Basically, to modify the table layout for complex type columns we can use an ADD COLUMNS, DROP COLUMN, or CHANGE clause.
Amazon S3 considerations
To make Impala query data from the Amazon S3 filesystem, we can specify an s3a:// prefix on the LOCATION attribute of a table or partition.
However, whenever we issue ALTER TABLE statements with the ADD PARTITION or DROP PARTITION clauses, Â Impala automatically handles creating or removing the associated folders, in CDH 5.8 / Impala 2.6 and higher.
In addition, Impala DDL statements can create or remove folders as needed in the Amazon S3 system in CDH 5.8 / Impala 2.6 and higher. Such as CREATE DATABASE, CREATE TABLE, DROP DATABASE CASCADE, DROP TABLE, and ALTER TABLE [ADD|DROP] PARTITION.
Although, we had to create folders yourself and point Impala database, tables, or partitions at them, and manually remove folders when no longer needed before CDH 5.8 / Impala 2.6.
HDFS caching (CACHED IN clause)
However, any existing or future data files in the table directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching mechanism, since we specify the CACHED IN clause.
There is an optional WITH REPLICATION clause for CREATE TABLE and ALTER TABLE, Â in Impala 2.2 / CDH 5.4 and higher. Basically, it lets you specify a replication factor, the number of hosts on which to cache the same data blocks.
Moreover, Impala randomly selects a host that has a cached copy of that data block, when Impala processes a cached data block, also where the cache replication factor is greater than 1.
Basically, we can enable the SYNC_DDL query option to make each DDL statement wait before returning If you connect to different Impala nodes within an impala-shell session for load-balancing purposes until the new or changed metadata has been received by all the Impala nodes.
HDFS permissions
However, Many of the ALTER TABLE clauses do not actually read or write any HDFS files. Hence, they do not depend on specific HDFS permissions.
Cancellation
It is not possible to cancel it. That implies it Cannot be canceled.
This was all in the Impala Alter Table Statement.
Conclusion – Impala Alter Table Statement
As a result, we have seen the whole concept of Impala Alter Table Statement. Still, if any doubt occurs in how to create impala database, feel free to ask in the comment section.
Did we exceed your expectations?
If Yes, share your valuable feedback on Google