Difference between Internal and External table in Hive?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Difference between Internal and External table in Hive?

Viewing 2 reply threads
  • Author
    Posts
    • #4677
      DataFlair TeamDataFlair Team
      Spectator

      What is the difference between Internal & External table in Hive?
      In which case we should use Internal table ?
      In which case we should use External table ?

    • #4678
      DataFlair TeamDataFlair Team
      Spectator

      External table

      If we drop the external table in Hive after loading data, only meta information will be erased from meta-data/meta store. Loaded data would be available inside physical location/directory into HDFS or the path what we provided in ” load data…..” command in hive shell. After dropped, if we create a table with same column name and data types, we can view the same data without loading the data again. Besides, we can view the same directory where data loaded before dropped through browser. In External table the data and table is loosely coupled.

      Internal table

      But for internal tables, above won’t happen. If we drop internal table, meta information as well as the directory/location with data file would completely removed. Table as well as Same data has to be loaded if we want to query again.

      For locally available data in side local file system, we should go for Hive internal table. Where Hive organizes them inside a warehouse directory, which is controlled by the hive.metastore.warehouse.dir property whose default value is /user/hive/warehouse (in HDFS). In Internal table, the data and table is tightly coupled.

      external tables are normally used when alterations to the data could happen with some other tool, or the data already exists and we want to keep it in its original form and use it in Hive.

    • #4679
      DataFlair TeamDataFlair Team
      Spectator

      Hive tables can be created as EXTERNAL or MANAGED/ INTERNAL. This is a choice that affects how data is loaded, controlled, and managed.

      Use EXTERNAL tables when:

      1. The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files.
      2. Even after a DROP TABLE, data needs to remain in the underlying location. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.
      3. You want to use a custom location such as ASV.
      4. Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things.
      5. You are not creating the table based on the existing table (AS SELECT).

      Use INTERNAL tables when:

      1. The data is temporary.
      2. You want Hive to completely manage the lifecycle of the table and data.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.