Comparison between Hive Internal Tables vs External Tables

Boost your career with Free Big Data Courses!!

Curious to know different types of Hive tables and how they are different from each other?

As discussed the basics of Hive tables in Hive Data Models, let us now explore the major difference between hive internal and external tables.

In this article, we are going to discuss the two different types of Hive Table that are Internal table (Managed table) and External table. The article then enlists the differences between Hive Internal tables and External Tables. We will also see different cases where we can use these Hive tables.

So, let’s start with Hive internal tables and external tables.

Introduction to Hive Internal and External tables

 

Fundamentally, Hive knows two different types of tables: Internal table and the External table. The Internal table is also known as the managed table.

We can identify the internal or External tables using the DESCRIBE FORMATTED table_name statement in the Hive, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on the table type.

1. Hive Internal Table

Hive owns the data for the internal tables.

It is the default table in Hive. When the user creates a table in Hive without specifying it as external, then by default, an internal table gets created in a specific location in HDFS.

By default, an internal table will be created in a folder path similar to /user/hive/warehouse directory of HDFS. We can override the default location by the location property during table creation.

If we drop the managed table or partition, the table data and the metadata associated with that table will be deleted from the HDFS.

2. Hive External Table

Hive does not manage the data of the External table.

We create an external table for external use as when we want to use the data outside the Hive.

External tables are stored outside the warehouse directory. They can access data stored in sources such as remote HDFS locations or Azure Storage Volumes.

Whenever we drop the external table, then only the metadata associated with the table will get deleted, the table data remains untouched by Hive.

We can create the external table by specifying the EXTERNAL keyword in the Hive create table statement.

Difference between Hive Internal and External Table

hive internal and external tableshive internal and external tables

Let us now see the difference between both Hive tables. The major differences in the internal and external tables in Hive are:

1. LOAD semantics

The Load semantics varies in both the tables. Let us see the difference in load semantics between the internal table and the external table.

a. Internal Table

When we load data into an internal table, then Hive moves data into the warehouse directory.

For example:

Here in this example, we are creating a table ‘internaldemo’. When we load data into the ‘internaldemo’ table, then Hive moves the data to the warehouse directory.

create internal table - hive internal and external table

Now, loading data into the internal table created above. load data into internal table - hive internal and external tables

On describing the table, we see that the table data is moved to the Hive warehouse directory.describe internal table - hive internal and external tables

b. External Table

With the EXTERNAL keyword, Hive knows that it is not managing the table data, so it does not move data to its warehouse directory. Hive does not even check whether the external location at the time it is defined exists or not.

For example:

In this example, we are creating an external table, ‘external_demo’ on the specified location that is ‘/home/dataflair/’. On loading the data into the external table, the Hive does not move table data to its warehouse directory.

create external table - hive internal and external tables

Now, loading data to the external table created above. load data into external table - hive internal and external tables

On browsing the table, we can see that the Hive table data is not moved to the Hive warehouse directory. It is stored in the location specified while creating a table.describe external table - hive internal and external tables


2. DROP semantics

Like load semantics, drop semantics also varies in both tables. Let us see the difference in drop semantics between the internal table and the external table.

a. Internal table

Dropping the internal table will delete the table data, as well as the metadata associated with the table.

For example:

In this example, we are dropping the managed table ‘internaldemo’. You can see that it will delete both the table metadata as well as the table data.

drop internal table - hive internal and external tables

locating hive internal table - hive internal and external tables

Table data also gets deleted from the HDFS.

b. External table

Dropping the external table will delete only table metadata. The table content remains untouched.

For example:

In this example, we can see that on deleting the external table ‘external demo’, the content of the table is still present in the HDFS location.

droping external table - hive tables

locating external table - hive tables

3. TRUNCATE support

The TRUNCATE command only works for the internal table.

For example:

In this example, we are trying to use the truncate command with ‘externaldemo’ table. We will get an error message that says ‘Cannot truncate non-managed table externaldemo’.

truncate external table - hive tables

4. ACID Support

ACID/transactional works only for the internal table. They do not work for External Table.

5. Query Result Caching

Query Result Caching that saves the results of an executed Hive query for reuse on subsequent queries only works for the internal table.

Thus there are some differences between the Hive internal table and the external table.

Let us now see when to use which Hive table.

When to use the Internal and External table?

1. Hive Internal Table

We can use the internal table in cases:

  • When generating temporary tables.
  • When required that Hive should manage the lifecycle of the table.
  • And when we don’t want table data after deletion.

2. Hive External Table

We can use the external table in cases:

  • When we are not creating the table based on the existing table.
  • When required to use data outside of Hive. For example, the data files are read and processed by an existing program that does not lock the files.
  • When we don’t want to delete the table data completely even after DROP.
  • When the data should not be own by Hive.

So, this was all in Hive internal and external tables. I hope you like our explanation.

Conclusion

After reading the article, we can conclude that the Hive tables are of two types- internal table, also known as the managed table, and the External table.

The internal table data gets managed by the Hive. Hive is not responsible for managing data of the External table.

On deleting the Hive internal table, the table data and the metadata both get deleted and on deleting the Hive External table, only the table metadata will delete.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google

follow dataflair on YouTube

10 Responses

  1. Babu Venkat says:

    Can i retrieve managed table data after deleting?

  2. Mohit Jain says:

    Dear Sir, I have 2 queries which are as follows :-

    If I alter the default Location(/user/hive/warehouse/) of Internal Table by using “Location” keyword with path, then that newly altered table will be treated/behave as Internal Table or External Table?

    • Rahul says:

      If you change the location of internal table (managed table), then also it will be treated as internal table.

  3. Abdulkadir Yigit says:

    Thanks, It helped a lot 🙂 By the way , I am new on Apache Hive. Are we able to create procedure or temporary tables on the system ?

  4. Suneel says:

    “External Tables – External table behaves differently. In this, we can control the creation and deletion of the data. ” – Please correct it . As we cannot control the creation and deletion of the data.

    Please correct me if i am wrong. TIA

  5. Asif Rehman says:

    Dear Author
    I am working with an web application having Oracle database. I have technical knowledge of “Data Ware House” (Cubes and Dimension) .I am just new to Hadoop, I want to use Analytics part of Hadoop. Where should I start?
    I have configured Oracle Big Data Lite VM. Can you help me ? Can you share your expert opinion?

  6. Prathik Kini says:

    Can an external table without specifying location be created and if dropped files will be deleted or not?

  7. NAVNEET SINGH SAJWAN says:

    What does this line about external tables mean?
    ‘They can access data stored in sources such as remote HDFS locations or Azure Storage Volumes.’

  8. Aurora Xue says:

    Thanks for the explanation about internal and external table it is very helpful. I wonder in the drop section, you mentioned that for the external table after table be dropped the data still exist in HDFS. So the overall process is: create external table -> copy data into the HIVE external table from data source -> data will be stored in HDFS -> drop external table -> data still exists in HDFS and original data source(if I did not delete it). Am I understand it correctly?
    And even though I delete the data from original source, since the data already be copied to HDFS so it still exists in HIVE system?

Leave a Reply

Your email address will not be published. Required fields are marked *