Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Hive › Difference between External table and Managed table in hive?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
October 10, 2018 at 3:47 pm #6559DataFlair TeamSpectator
What is the difference between external and managed table in Hive?
Difference between Hive internal tables and external tables?
What is an external table in hive? -
October 10, 2018 at 3:48 pm #6560DataFlair TeamSpectator
In HIVE there are two ways to create tables: Managed Tables and External Tables
when we create a table in HIVE, HIVE by default manages the data and saves it in its own warehouse, where as we can also create an external table, which is at an existing location outside the HIVE warehouse directory.
The main difference between these two types of tables is seen during LOAD and DROP statements.
Eg: MANAGED TABLES (CREATE TABLES)
To create Managed tables, we just use the simple CREATE Statement (see below)
When we load a data into a Managed table, it is moved into HIVE warehouse directory.
CREATE TABLE managed_table(ds string); LOAD DATA INPATH ' /user/ric/data.txt' INTO table managed_table
Now, this LOAD statement will move the file into HIVE warehouse directory for the manages_tables table which is
hdfs://user/hive/warehouse/managed_table
if the table is later dropped,
DROP TABLE managed_table;
the table including its metadata and its data is deleted, which means the data no longer exists anywhere.
Eg: EXTERNAL TABLES
Incase of EXTERNAL TABLES, the location of the external data is specified at table creation time and also uses the Key word EXTERNAL in CREATE STATEMENT (see below)
CREATE EXTERNAL TABLE external_table(ds string); LOCATION '/user/ric/external_table'; LOAD DATA INPATH '/user/ric/external_table INTO TABLE external_table;
Here the data is not moved to HIVE warehouse directory, and it is actually saved in an external location, therefore when you drop the external_table, HIVE will leave the data untouched and only delete the metadata.
So which type of table to use?
Incase we are doing all our processing in HIVE then it is good to use Managed table, where as incase we use HIVE as well as another tool on the same dataset then it is good to use an External Table or incase of we use multiple schemas for the same dataset.
-
-
AuthorPosts
- You must be logged in to reply to this topic.