Impala CREATE TABLE Statement – Advance Tutorial

Boost your career with Free Big Data Courses!!

In our last tutorial, we studied the Create Database and Drop Database. When it comes to creating a new table in the required database, we use several statements in Impala. That statement we call Impala CREATE TABLE Statement. It includes its syntax usage as well as the example to understand it well.

So, let’s start How Impala Create Table Statement.

Introduction – Impala CREATE TABLE Statement

The simple definition, when we create a table, you optionally specify several aspects. Like:

  • Internal or external table.
  • Columns and associated data types.
  • For physically partitioning the data we use the columns.
  • The file format for data files.
  • Also, the HDFS directory where the data files are located.

In order to create a new table in the required database, we use the CREATE TABLE Statement in Impala. Basically, the process of naming the table and defining its columns and each column’s data type is what we call Creating a basic table.

A Syntax of Create Table in Impala

 To create table in impala and specifying its columns, the general syntax is as follows:

I. Explicit column definitions

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 (col_name data_type [COMMENT 'col_comment'], ...)
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [COMMENT 'table_comment']
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [
  [ROW FORMAT row_format] [STORED AS file_format]
 ]
 [LOCATION 'hdfs_path']
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]

ii. Column definitions inferred from data file

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 LIKE PARQUET 'hdfs_path_of_parquet_file'
 [COMMENT 'table_comment']
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [
  [ROW FORMAT row_format] [STORED AS file_format]
 ]
 [LOCATION 'hdfs_path']
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
data_type:
   primitive_type
 | array_type
 | map_type
 | struct_type

iii. Create Table as Select

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] db_name.]table_name
 [PARTITIONED BY (col_name[, ...])]
 [COMMENT 'table_comment']
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [
  [ROW FORMAT row_format] [STORED AS ctas_file_format]
 ]
 [LOCATION 'hdfs_path']
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
AS
 select_statement
primitive_type:
   TINYINT
 | SMALLINT
 | INT
 | BIGINT
 | BOOLEAN
 | FLOAT
 | DOUBLE
 | DECIMAL
 | STRING
 | CHAR
 | VARCHAR
 | TIMESTAMP
complex_type:
   struct_type
 | array_type
 | map_type
struct_type: STRUCT < name : primitive_or_complex_type [COMMENT 'comment_string'], ... >
array_type: ARRAY < primitive_or_complex_type >
map_type: MAP < primitive_type, primitive_or_complex_type >
row_format:
 DELIMITED [FIELDS TERMINATED BY 'char' [ESCAPED BY 'char']]
 [LINES TERMINATED BY 'char']
file_format:
   PARQUET
 | TEXTFILE
 | AVRO
 | SEQUENCEFILE
 | RCFILE
ctas_file_format:
   PARQUET
 | TEXTFILE

iv. Statement type

Impala CREATE TABLE Statement is of DDL Type.

Example of Impala Create Table Statement

Let’s understand the Impala CREATE TABLE statement with the example. To create table statement. Here, we have created a table named EMPLOYEE in the database my_db.

[quickstart.cloudera:21000] > CREATE TABLE IF NOT EXISTS my_db.EMPLOYEE
  (name STRING, age INT, contact INT );

Now, a table with the specified name will be created. Further, it displays the following output.

Query: create table EMPLOYEE (name STRING, age INT, phone INT)
Fetched 0 row(s) in 0.48s

Next in Impala CREATE TABLE is it is Verification.

Verification in Impala Create Table Statements

Basically,  in Impala, the show Tables query gives a list of tables in the current database. That implies, using the Show Tables statement, we can verify whether the table is created.

At first, we need to switch the context to the database in which the required table exists.

[quickstart.cloudera:21000] > use my_db;
Query: use my_db

Further, we can observe the table named EMPLOYEE in it,  since we get the list of tables using the show tables query.

[quickstart.cloudera:21000] > show tables;
Query: show tables
+-----------+
| name      |
+-----------+
| EMPLOYEE   |
+-----------+
Fetched 1 row(s) in 0.10s

HDFS path in Impala

Basically, we need to specify the location where the database is to be created, to create a database in HDFS file system. Like below.

CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;

How to Create a Database using Hue Browser

Steps are:

  • At first, type the CREATE Table Statement in impala Query editor. Then, click on the execute button.
  • Afterward, gently move the cursor to the top of the drop-down menu just after executing the query. There is a refresh symbol. Further, the list of databases will be refreshed once you click on the refresh symbol. In this way, the recent changes which are done are applied to it.

I. Verification

  • In order to verify, first Click on the drop down under the heading DATABASE on the left-hand side of the editor. A list of databases will get open. Then Select the database my_db.
  • Further, you can see a list of tables, on selecting the database my_db. There you will find the newly created table EMPLOYEE.

So, this is all about Impala CREATE TABLE Statement. Hope you like our explanation.

Conclusion – Impala Create Table Statements

As a result, we have seen the whole concept of Impala CREATE TABLE Statement. Still, if any query occurs feel free to ask in the comment section.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *