Difference between External table and Managed table in hive?

Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hive Difference between External table and Managed table in hive?

Viewing 1 reply thread
  • Author
    Posts
    • #6559
      DataFlair Team
      Spectator

      What is the difference between external and managed table in Hive?
      Difference between Hive internal tables and external tables?
      What is an external table in hive?

    • #6560
      DataFlair Team
      Spectator

      In HIVE there are two ways to create tables: Managed Tables and External Tables

      when we create a table in HIVE, HIVE by default manages the data and saves it in its own warehouse, where as we can also create an external table, which is at an existing location outside the HIVE warehouse directory.

      The main difference between these two types of tables is seen during LOAD and DROP statements.

      Eg: MANAGED TABLES (CREATE TABLES)

      To create Managed tables, we just use the simple CREATE Statement (see below)

      When we load a data into a Managed table, it is moved into HIVE warehouse directory.

      CREATE TABLE managed_table(ds string);
      LOAD DATA INPATH ' /user/ric/data.txt' INTO table managed_table

      Now, this LOAD statement will move the file into HIVE warehouse directory for the manages_tables table which ishdfs://user/hive/warehouse/managed_table

      if the table is later dropped,

      DROP TABLE managed_table;

      the table including its metadata and its data is deleted, which means the data no longer exists anywhere.

      Eg: EXTERNAL TABLES

      Incase of EXTERNAL TABLES, the location of the external data is specified at table creation time and also uses the Key word EXTERNAL in CREATE STATEMENT (see below)

      CREATE EXTERNAL TABLE external_table(ds string);
      LOCATION '/user/ric/external_table';
      LOAD DATA INPATH '/user/ric/external_table INTO TABLE external_table;

      Here the data is not moved to HIVE warehouse directory, and it is actually saved in an external location, therefore when you drop the external_table, HIVE will leave the data untouched and only delete the metadata.

      So which type of table to use?

      Incase we are doing all our processing in HIVE then it is good to use Managed table, where as incase we use HIVE as well as another tool on the same dataset then it is good to use an External Table or incase of we use multiple schemas for the same dataset.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.