Impala CREATE DATABASE Statement With Examples

Stay updated with the latest technology trends while you're on the move - Join DataFlair's Telegram Channel

1. Impala CREATE DATABASE

It is possible to create a new database in Impala. So, in this article, we will discuss, how we can create it with Impala CREATE DATABASE Statement. Also, we will cover its syntax, usage as well as Example to understand it well.
So, Let’s Start How Impala Create Database.

Impala CREATE DATABASE Statement with Examples

Impala CREATE DATABASE Statement with Examples

2. How Impala CREATE DATABASE Statement

By using Impala CREATE DATABASE Statement, we create a new database.

Basically, we can define a database in two ways, such as:

  • It can be a logical construct. Especially, for grouping together related tables, views, and functions within their own namespace. However, it is possible to use a separate database for each application, set of related tables, or the round of experimentation.
  • Also, it can be a physical construct represented by a directory tree in HDFS. Moreover, under this directory, all Tables (internal tables), partitions, and data files are located. In addition, it is possible to perform several operations HDFS-level operations with a DROP DATABASE statement. For example, backing it up and measuring space usage, or remove it.

Let’s Learn Impala Architecture – Components of Impala

a. A Syntax for Create Database in Impala

So, the syntax for using Impala CREATE DATABASE Statement is-

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name[COMMENT 'database_comment']
[LOCATION hdfs_path];

b. Statement type

Impala CREATE DATABASE Statement is of DDL Type.

c. Usage

Basically, under the main Impala data directory, a database is physically represented as a directory in HDFS, with a filename extension .db. However, make sure, it is created for you if the associated HDFS directory does not exist. Also, if there is no physical or logical nesting, all databases and their associated directories are of top-level objects.

In addition, we use USE statement, to make it the current database within an impala-shell session after creating a database. Also, it is possible to reference tables in the current database without prepending any qualifier to their names.

Basically, the database we start is named default, while we first connect to Impala through impala-shell.
Let’s Discuss Impala Shell and Impala command
In addition, there is another predefined database in Impala. Such as _impala_builtins.  That serves as the location for the built-in functions. Moreover, we can use below statement in order to see the built-in functions:
show functions in _impala_builtins;
show functions in _impala_builtins like ‘*substring*’;

Impala-shell session or another impala-shell connected to the same node can immediately access that database, after creating one. In order to access the database through the Impala daemon on a different node, we issue the INVALIDATE METADATA statement first while connected to that other node.

Also, it is important to set the LOCATION attribute for a new database. It is a way to work with sets of files in an HDFS directory structure outside the default Impala data directory, as opposed to setting the LOCATION attribute for each individual table.
Do you know How Impala Select a Database using Hue Browser

If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. Hive Considerations

Hive Considerations means, Hive can always use a database, whenever we create one in Impala. In addition, to make Impala permanently aware of the new database it issues an INVALIDATE METADATA statement in Impala, whenever we create a database in Hive.
Follow this link to know more Hive

Moreover, to lists all databases, or the databases whose name matches a wildcard pattern, we use the SHOW DATABASES statement. Also, the SHOW DATABASES output includes a second column that displays the associated comment, for each database, if any, in Impala 2.5 and higher.

4. Amazon S3 Considerations

We can include an s3a:// prefix on the LOCATION attribute, in order to specify that any tables created within a database reside on the Amazon S3 system or not. In Impala 2.6 and higher, there is a beautiful feature sustains. That is, it automatically creates any required folders as the databases, tables, and partitions are created. Also, removes them when they are dropped.

Before Impala 2.6, we needed to create folders yourself. Also, we needed to point Impala database, tables, or partitions at them, and manually remove folders when no longer needed. However, In Impala 2.6 and higher, its DDL statements can create or remove folders as needed in the Amazon S3 system. Like CREATE DATABASE, CREATE TABLE, DROP DATABASE CASCADE, DROP TABLE, and ALTER TABLE [ADD|DROP] PARTITION.
Let’s Learn How Impala CREATE VIEW Statement

5. Cancellation

It is not possible to cancel it. That implies it Cannot be canceled.

6. HDFS Permissions

However, it is essential that for the user ID that the impalaD daemon runs under, have to write permission for the parent HDFS directory under which the database is located.

Hadoop Quiz

7. Examples of Create Database

For Example,

create database first_db;
use first_db;
create table t1 (x int);
create database second_db;
use second_db;

— Each database has its own namespace for tables.

— You can reuse the same table names in each database.

create table t1 (s string);
create database temp;

— You can either USE a database after creating it,
— or qualify all references to the table name with the name of the database.
— Here, tables T2 and T3 are both created in the TEMP database.

create table temp.t2 (x int, y int);
use database temp;
create table t3 (s string);

— You cannot drop a database while it is selected by the USE statement.
drop database temp;
ERROR: AnalysisException: Cannot drop current default database: temp
Read About Introduction to Impala Security with Security Guidelines
— The always-available database ‘default’ is a convenient one to USE
— before dropping a database you created.
use default;

— Before dropping a database, first, drop all the tables inside it,
— or in Impala 2.3 and higher use the CASCADE clause.
drop database temp;
ERROR: ImpalaRuntimeException: Error making ‘dropDatabase’ RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: Database temp is not empty
show tables in temp;
+——+
| name |
+——+
| t3   |
+——+
— Impala 2.3 and higher:

drop database temp cascade;

— Earlier releases:

drop table temp.t3;
drop database temp;

This is all on Impala CREATE DATABASE Statements.

8. Conclusion

As a result, we have seen the whole concept of Impala CREATE DATABASE Statement. Still, if any doubt occurs in how to create impala database, feel free to ask in the comment section.
See Also- Impala UNION Clause
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.