Impala CREATE DATABASE Statement With Examples
It is possible to create a new database in Impala. So, in this article, we will discuss, how we can create it with Impala CREATE DATABASE Statement. Also, we will cover its syntax, usage as well as Example to understand it well.
So, Let’s Start How Impala Create Database.
How Impala CREATE DATABASE Statement
By using Impala CREATE DATABASE Statement, we create a new database.
Basically, we can define a database in two ways, such as:
- It can be a logical construct. Especially, for grouping together related tables, views, and functions within their own namespace. However, it is possible to use a separate database for each application, set of related tables, or the round of experimentation.
- Also, it can be a physical construct represented by a directory tree in HDFS. Moreover, under this directory, all Tables (internal tables), partitions, and data files are located. In addition, it is possible to perform several operations HDFS-level operations with a DROP DATABASE statement. For example, backing it up and measuring space usage, or remove it.
Stay updated with latest technology trends
Join DataFlair on Telegram!!
a. A Syntax for Create Database in Impala
So, the syntax for using Impala CREATE DATABASE Statement is-
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name[COMMENT 'database_comment'] [LOCATION hdfs_path];
b. Statement type
Impala CREATE DATABASE Statement is of DDL Type.
Basically, under the main Impala data directory, a database is physically represented as a directory in HDFS, with a filename extension .db. However, make sure, it is created for you if the associated HDFS directory does not exist.
Also, if there is no physical or logical nesting, all databases and their associated directories are of top-level objects.
In addition, we use USE statement, to make it the current database within an impala-shell session after creating a database. Also, it is possible to reference tables in the current database without prepending any qualifier to their names.
Basically, the database we start is named default, while we first connect to Impala through impala-shell.
In addition, there is another predefined database in Impala. Such as _impala_builtins.
That serves as the location for the built-in functions. Moreover, we can use below statement in order to see the built-in functions:
show functions in _impala_builtins;
show functions in _impala_builtins like ‘*substring*’;
Impala-shell session or another impala-shell connected to the same node can immediately access that database, after creating one. In order to access the database through the Impala daemon on a different node, we issue the INVALIDATE METADATA statement first while connected to that other node.
Also, it is important to set the LOCATION attribute for a new database. It is a way to work with sets of files in an HDFS directory structure outside the default Impala data directory, as opposed to setting the LOCATION attribute for each individual table.
Hive Considerations means, Hive can always use a database, whenever we create one in Impala. In addition, to make Impala permanently aware of the new database it issues an INVALIDATE METADATA statement in Impala, whenever we create a database in Hive.
Moreover, to lists all databases, or the databases whose name matches a wildcard pattern, we use the SHOW DATABASES statement. Also, the SHOW DATABASES output includes a second column that displays the associated comment, for each database, if any, in Impala 2.5 and higher.
Amazon S3 Considerations
We can include an s3a:// prefix on the LOCATION attribute, in order to specify that any tables created within a database reside on the Amazon S3 system or not. In Impala 2.6 and higher, there is a beautiful feature sustains.
That is, it automatically creates any required folders as the databases, tables, and partitions are created. Also, removes them when they are dropped.
Before Impala 2.6, we needed to create folders yourself. Also, we needed to point Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
However, In Impala 2.6 and higher, its DDL statements can create or remove folders as needed in the Amazon S3 system. Like CREATE DATABASE, CREATE TABLE, DROP DATABASE CASCADE, DROP TABLE, and ALTER TABLE [ADD|DROP] PARTITION.
It is not possible to cancel it. That implies it Cannot be canceled.
However, it is essential that for the user ID that the impalaD daemon runs under, have to write permission for the parent HDFS directory under which the database is located.
Examples of Create Database
create database first_db; use first_db; create table t1 (x int); create database second_db; use second_db;
— Each database has its own namespace for tables.
— You can reuse the same table names in each database.
create table t1 (s string); create database temp;
— You can either USE a database after creating it,
— or qualify all references to the table name with the name of the database.
— Here, tables T2 and T3 are both created in the TEMP database.
create table temp.t2 (x int, y int); use database temp; create table t3 (s string);
— You cannot drop a database while it is selected by the USE statement.
drop database temp;
ERROR: AnalysisException: Cannot drop current default database: temp
— The always-available database ‘default’ is a convenient one to USE
— before dropping a database you created.
— Before dropping a database, first, drop all the tables inside it,
— or in Impala 2.3 and higher use the CASCADE clause.
drop database temp;
ERROR: ImpalaRuntimeException: Error making ‘dropDatabase’ RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: Database temp is not empty
show tables in temp;
| name |
| t3 |
— Impala 2.3 and higher:
drop database temp cascade;
— Earlier releases:
drop table temp.t3; drop database temp;
This is all on Impala CREATE DATABASE Statements.
As a result, we have seen the whole concept of Impala CREATE DATABASE Statement. Still, if any doubt occurs in how to create impala database, feel free to ask in the comment section.