Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hive What is the difference between CREATE TABLE AND CREATE EXTERNAL TABLE ?

Viewing 1 reply thread
  • Author
    • #6561

      > In Hive, one can create two types of tables : Internal Table and External Table
      > Internal table can be created using command ‘CREATE TABLE’
      > External table can be created using command ‘CREATE EXTERNAL TABLE’

    • #6562

      In HIVE there are two ways to create tables: Managed Tables and External Tables

      when we create a table in HIVE, HIVE by default manages the data and saves it in its own warehouse, where as we can also create an external table, which is at an existing location outside the HIVE warehouse directory.

      The main difference between these two types of tables is seen during LOAD and DROP statements.


      To create Managed tables, we just use the simple CREATE Statement (see below)

      When we load a data into a Managed table, it is moved into HIVE warehouse directory.

      CREATE TABLE managed_table(ds string);
      LOAD DATA INPATH ' /user/ric/data.txt' INTO table managed_table

      Now, this LOAD statement will move the file into HIVE warehouse directory for the manages_tables table which ishdfs://user/hive/warehouse/managed_table

      if the table is later dropped,

      DROP TABLE managed_table;

      the table including its metadata and its data is deleted, which means the data no longer exists anywhere.


      Incase of EXTERNAL TABLES, the location of the external data is specified at table creation time and also uses the Key word EXTERNAL in CREATE STATEMENT (see below)

      CREATE EXTERNAL TABLE external_table(ds string);
      LOCATION '/user/ric/external_table';
      LOAD DATA INPATH '/user/ric/external_table INTO TABLE external_table;

      Here the data is not moved to HIVE warehouse directory, and it is actually saved in an external location, therefore when you drop the external_table, HIVE will leave the data untouched and only delete the metadata.

      So which type of table to use?

      Incase we are doing all our processing in HIVE then it is good to use Managed table, where as incase we use HIVE as well as another tool on the same dataset then it is good to use an External Table or incase of we use multiple schemas for the same dataset.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.