Impala Alter Table Statement – HDFS Caching and Permissions

Boost your career with Data Engineering Courses!!

Basically, to change the structure or properties of an existing Impala table we use Impala Alter Table Statement. There are much more to know about Impala Alter Table Statement. Like Altering the name of a table,

Adding columns to a table, Dropping columns from a table, Changing the name and type of a column or Altering a Table using Hue. So, in this article on Impala Alter Table Statement, we will discuss all of them. Also, we will include syntax and example to understand it well.

Introduction to Impala Alter Table Statement?

While we need to change the structure or properties of an existing Impala table, we use Impala Alter Table Statement.

Basically, there is a primarily a logical operation that updates the table metadata in the Metastore database that Impala shares with Hive, in Impala. However, we can say most ALTER TABLE operations do not actually rewrite, move, and so on the actual data files.

Although, the one exception is The RENAME TO clause it can cause HDFS files to be moved to different paths.

Also, we typically need to perform corresponding physical filesystem operations, whenever we do an ALTER TABLE operation. Like rewriting the data files to include extra fields, or converting them to a different file format.

To be very specific, to perform changes on a given table, we use Impala Alter table statement. In addition, we can add, delete, or modify columns in an existing table and we can also rename it, using this statement.

Syntax of Impala Alter Table Statements

So, the syntax for using Impala Alter Table Statement is-

ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name { ADD [IF NOT EXISTS] | DROP [IF EXISTS] } PARTITION (partition_spec) [PURGE]
ALTER TABLE name RECOVER PARTITIONS
ALTER TABLE name [PARTITION (partition_spec)]
 SET { FILEFORMAT file_format
 | LOCATION 'hdfs_path_of_directory'
 | TBLPROPERTIES (table_properties)
 | SERDEPROPERTIES (serde_properties) }
ALTER TABLE name colname
 ('statsKey'='val, ...)
statsKey ::= numDVs | numNulls | avgSize | maxSize
ALTER TABLE name [PARTITION (partition_spec)] SET { CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED }
new_name ::= [new_database.]new_table_name
col_spec ::= col_name type_name
partition_spec ::= partition_col=constant_value
table_properties ::= 'name'='value'[, 'name'='value' ...]
serde_properties ::= 'name'='value'[, 'name'='value' ...]
file_format ::= { PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO }

Statement type

Impala Alter Table Statement is of DDL Type.

Usage of Impala Alter Table Statements

Let me now talk about the usage of Impala Alter Table Statement. Make sure to include all the partitioning columns in the specification, while we specify partitions in an ALTER TABLE statement, through the PARTITION (partition_spec) clause.

For both internal as well as External tables, most of the ALTER TABLE operations work the same. However, there is an exception, the underlying data directory is not renamed or moved to an external table.

Examples – Impala Alter table Statement

Let’s suppose we have a table named Employees in Impala. Here, is the data −

id	name	age	address	salary
1	shubham	32	delhi	20000
2	monika	25	mumbai	15000
3	kajal	27	alirajpur	40000
4	revti	25	indore	35000
5	shreyash	23	pune	30000
6	mehul	22	hyderabad	32000

Now, see the list of tables in the database my_db, where we can find the Employees table

[quickstart.cloudera:21000] > show tables;

Query: show tables

name

Employees

Employees 1

Employees 2

Employees 3

i. Altering the name of a table

Syntax

ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name

Example

For Example,
So, we are changing the name of the table Employees to Students.

[quickstart.cloudera:21000] > ALTER TABLE my_db.Employees RENAME TO my_db.Students;

Displaying the following message, Impala changes the name of the table as required just after executing the above query.

Query: alter TABLE my_db.Employees RENAME TO my_db.Students

Now, you will see the table named Students instead of Employees.
Query: show tables

name

Students

Employees 1

Employees 2

Employees 3

Fetched 4 row(s) in 0.10s

ii. Adding columns to a table

Syntax

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])

Example
For Example,
So, we are adding two columns account_no and phone_number (both are of bigint data type) to the Students table.

[quickstart.cloudera:21000] > ALTER TABLE Students ADD COLUMNS (account_no BIGINT,
phone_no BIGINT);

Now, it will add the specified columns to the table named student, On executing the above query.

Query: alter TABLE Students ADD COLUMNS (account_no BIGINT, phone_no BIGINT)

We can find the newly added columns since we verify the schema of the table Students in it as shown below.

quickstart.cloudera:21000] > describe Students;
Query: describe Students

name	type	comment
id	int
name	string
age	int
address	string
salary	bigint
account_no	bigint
phone_no	bigint

Fetched 7 row(s) in 0.20s

iii. Dropping columns from a table

Syntax

ALTER TABLE name DROP [COLUMN] column_name

Example

For Example,
So, we are deleting the column named account_no.

[quickstart.cloudera:21000] > ALTER TABLE Students DROP account_no;

On executing the above query, Impala deletes the column named account_no,

Query: alter TABLE Students DROP account_no

However, we cannot find the column named account_no since it was deleted since you verify the schema of the table Students.

[quickstart.cloudera:21000] > describe Students;
Query: describe Students

name	type	comment
id	int
name	string
age	int
address	string
salary	bigint
phone_no	bigint

Fetched 6 row(s) in 0.11s
Any doubt yet in Impala Alter Statement? Please Comment.

Iv. Changing the name and type of a column

Syntax

syntax is- ALTER TABLE name CHANGE column_name new_name new_type

Example

For Example,
So, we are changing the name of the column phone_no to email and it’s data type to string.

[quickstart.cloudera:21000] > ALTER TABLE Students CHANGE phone_no e_mail string;

On executing the above query, Impala does the specified changes.

Query: alter TABLE Students CHANGE phone_no e_mail string

By using the describe statement, we can verify the metadata of the table Students. Further, see the required changes to the specified column has done.

[quickstart.cloudera:21000] > describe Students;
Query: describe Students

name	type	comment
id	int
name	string
age	int
address	string
salary	bigint
e_mail	string

Fetched 6 row(s) in 0.11s

V. Altering a Table using Hue

There are several steps you have to follow while using Hue. Such as:

At first, type the alter statement in Impala Query editor. Then click on the execute button.
Now, it will change the name of the table Employees to Students, on executing the above query. Hence, we can execute all the alter queries, in the same way.

Complex Type Considerations

The ALTER TABLE statement can change the metadata for tables containing complex types, in CDH 5.5 / Impala 2.3 and higher. Such as ARRAY, STRUCT, and MAP. Basically, to modify the table layout for complex type columns we can use an ADD COLUMNS, DROP COLUMN, or CHANGE clause.

Amazon S3 considerations

To make Impala query data from the Amazon S3 filesystem, we can specify an s3a:// prefix on the LOCATION attribute of a table or partition.

However, whenever we issue ALTER TABLE statements with the ADD PARTITION or DROP PARTITION clauses, Impala automatically handles creating or removing the associated folders, in CDH 5.8 / Impala 2.6 and higher.

In addition, Impala DDL statements can create or remove folders as needed in the Amazon S3 system in CDH 5.8 / Impala 2.6 and higher. Such as CREATE DATABASE, CREATE TABLE, DROP DATABASE CASCADE, DROP TABLE, and ALTER TABLE [ADD|DROP] PARTITION.

Although, we had to create folders yourself and point Impala database, tables, or partitions at them, and manually remove folders when no longer needed before CDH 5.8 / Impala 2.6.

HDFS caching (CACHED IN clause)

However, any existing or future data files in the table directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching mechanism, since we specify the CACHED IN clause.

There is an optional WITH REPLICATION clause for CREATE TABLE and ALTER TABLE, in Impala 2.2 / CDH 5.4 and higher. Basically, it lets you specify a replication factor, the number of hosts on which to cache the same data blocks.

Moreover, Impala randomly selects a host that has a cached copy of that data block, when Impala processes a cached data block, also where the cache replication factor is greater than 1.

Basically, we can enable the SYNC_DDL query option to make each DDL statement wait before returning If you connect to different Impala nodes within an impala-shell session for load-balancing purposes until the new or changed metadata has been received by all the Impala nodes.

HDFS permissions

However, Many of the ALTER TABLE clauses do not actually read or write any HDFS files. Hence, they do not depend on specific HDFS permissions.

Cancellation

It is not possible to cancel it. That implies it Cannot be canceled.
This was all in the Impala Alter Table Statement.

Conclusion – Impala Alter Table Statement

As a result, we have seen the whole concept of Impala Alter Table Statement. Still, if any doubt occurs in how to create impala database, feel free to ask in the comment section.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

Impala Alter Table Statement – HDFS Caching and Permissions

Introduction to Impala Alter Table Statement?

Syntax of Impala Alter Table Statements

Statement type

Usage of Impala Alter Table Statements

Examples – Impala Alter table Statement

Syntax

Example

ii. Adding columns to a table

Syntax

iii. Dropping columns from a table

Syntax

Example

Iv. Changing the name and type of a column

Syntax

Example

V. Altering a Table using Hue

Complex Type Considerations

Amazon S3 considerations

HDFS caching (CACHED IN clause)

HDFS permissions

Cancellation

Conclusion – Impala Alter Table Statement

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials