Top 7 Hive DML Commands with Syntax and Examples
Wishing to load, insert, retrieve, update, or delete data in the Hive tables?
After learning basic Commands in Hive, let us now study Hive DML Commands. Hive Data Manipulation Language commands are used for inserting, retrieving, modifying, deleting, and updating data in the Hive table.
In this article, we will learn Hive DML commands. There are many Hive DML commands like LOAD, INSERT, UPDATE, etc. We will explore each of these DML commands individually, along with their syntax and examples.
Before moving towards the Hive DML commands, let us first see the short introduction to Hive Query Language.
Hive Query Language(HQL)
Hive Query Language is a language used in Hive, similar to SQL, to process and analyze unstructured data.
Hive Query Language is easy to use if you are familiar with SQL. The syntax of Hive QL is very similar to SQL with slight differences.
Hive QL supports DDL, DML, and user-defined functions.
Introduction to Hive DML commands
Hive DML (Data Manipulation Language) commands are used to insert, update, retrieve, and delete data from the Hive table once the table and database schema has been defined using Hive DDL commands.
The various Hive DML commands are:
Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!
Let us now learn each DML command individually.
[ps2id id=’LOAD-command’ target=”/]1. LOAD Command
The LOAD statement in Hive is used to move data files into the locations corresponding to Hive tables.
- If a LOCAL keyword is specified, then the LOAD command will look for the file path in the local filesystem.
- If the LOCAL keyword is not specified, then the Hive will need the absolute URI of the file.
- In case the keyword OVERWRITE is specified, then the contents of the target table/partition will be deleted and replaced by the files referred by filepath.
- If the OVERWRITE keyword is not specified, then the files referred by filepath will be appended to the table.
Want to practice these Hive commands yourself? Follow the Hive-3.1.2 installation guide to install the latest Hive version on your system.
Syntax:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)];
Example:
Here we are trying to load data from the ‘dab’ file in the local filesystem to the ‘emp_data’ table.
[ps2id id=’SELECT-command’ target=”/]2. SELECT COMMAND
The SELECT statement in Hive is similar to the SELECT statement in SQL used for retrieving data from the database.
Syntax:
SELECT col1,col2 FROM tablename;
Example:
[ps2id id=’INSERT-command’ target=”/]3. INSERT Command
The INSERT command in Hive loads the data into a Hive table. We can do insert to both the Hive table or partition.
a. INSERT INTO
The INSERT INTO statement appends the data into existing data in the table or partition. INSERT INTO statement works from Hive version 0.8.
Syntax:
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Example:
Here in this example, we are trying to insert the data of ‘emp_data’ table created above into the table ‘example’.
INSERT statement to load data into table “example”.
b. INSERT OVERWRITE
The INSERT OVERWRITE table overwrites the existing data in the table or partition.
Syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, ..) [IF NOT EXISTS]] select_statement FROM from_statement;
Example:
Here we are overwriting the existing data of the table ‘example’ with the data of table ‘dummy’ using INSERT OVERWRITE statement.
By using the SELECT statement we can verify whether the existing data of the table ‘example’ is overwritten by the data of table ‘dummy’ or not.
c. INSERT .. VALUES
INSERT ..VALUES statement in Hive inserts data into the table directly from SQL. It is available from Hive 0.14.
Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...];
Example:
Inserting data into the ‘student’ table using INSERT ..VALUES statement.
[ps2id id=’DELETE-command’ target=”/]4. DELETE command
The DELETE statement in Hive deletes the table data. If the WHERE clause is specified, then it deletes the rows that satisfy the condition in where clause.
The DELETE statement can only be used on the hive tables that support ACID.
Syntax:
DELETE FROM tablename [WHERE expression];
Example:
In the below example, we are deleting the data of the student from table ‘student’ whose roll_no is 105.
By using the SELECT statement we can verify whether the data of the student from table ‘student’ whose roll_no is 105 is deleted or not.
[ps2id id=’UPDATE-command’ target=”/]5. UPDATE Command
The update can be performed on the hive tables that support ACID.
The UPDATE statement in Hive deletes the table data. If the WHERE clause is specified, then it updates the column of the rows that satisfy the condition in WHERE clause.
Partitioning and Bucketing columns cannot be updated.
Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression];
Example:
In this example, we are updating the branch of the student whose roll_no is 103 in the ‘student’ table using an UPDATE statement.
By using the SELECT statement we can verify whether the branch of the student whose roll_no is 103 in the ‘student’ table is updated or not.
[ps2id id=’EXPORT-command’ target=”/]6. EXPORT Command
The Hive EXPORT statement exports the table or partition data along with the metadata to the specified output location in the HDFS.
Metadata is exported in a _metadata file, and data is exported in a subdirectory ‘data.’
Syntax:
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])] TO 'export_target_path' [ FOR replication('eventid') ];
Example:
Here in this example, we are exporting the student table to the HDFS directory “export_from_hive”.
The table successfully exported. You can check for the _metadata file and data sub-directory using ls command.
[ps2id id=’IMPORT-command’ target=”/]7. IMPORT Command
The Hive IMPORT command imports the data from a specified location to a new table or already existing table.
Syntax:
IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]] FROM 'source_path' [LOCATION 'import_target_path'];
Example:
Here in this example, we are importing the data exported in the above example into a new Hive table ‘imported_table’.
Verifying whether the data is imported or not using hive SELECT statement.
Summary
After reading this article, I hope now you are familiar with the Hive DML commands.
The article explained how to load data into the Hive table, insert data into the Hive table, and delete rows from the hive table. In addition, we have studied how to update the particular row column in a table. We have also seen the Hive IMPORT and EXPORT statement with the help of an example.
Do you love DataFlair efforts? Please review us here.
Any queries while practicing Hive commands?
Ask us freely.
Keep Practicing!!
We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google
What exactly are the differences between load and import commands?
you cant perform overwrite in import command, I guess. However same is possible with “load data”.
Please correct me if I am wrong.