Hive Metastore – Different Ways to Configure Hive Metastore

Boost your career with Data Engineering Courses!!

In this tutorial, we are going to introduce Hive Metastore in detail. Metastore is the central repository of Hive Metadata. It stores the meta data for Hive tables and relations. For example, Schema and Locations etc.

This Hive tutorial will cover what is Hive Metastore, how the Hive Metastore works, what is Derby in Hive, how to Configure Hive Metastore and What are the Databases Supported by Hive? We will discuss the answer to all the above questions in detail.

So, let’s start Hive Metastore Tutorial.

Hive Metastore – Different Ways to Configure Hive Metastore

What is Hive Metastore?

Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API.

Hive metastore consists of two fundamental units:

A service that provides metastore access to other Apache Hive services.
Disk storage for the Hive metadata which is separate from HDFS storage.

Hive Metastore Modes

There are three modes for Hive Metastore deployment:

Embedded Metastore
Local Metastore
Remote Metastore

Let’s now discuss the above three Hive Metastore deployment modes one by one-

i. Embedded Metastore

In Hive by default, metastore service runs in the same JVM as the Hive service. It uses embedded derby database stored on the local file system in this mode. Thus both metastore service and hive service runs in the same JVM by using embedded Derby Database.

But, this mode also has limitation that, as only one embedded Derby database can access the database files on disk at any one time, so only one Hive session could be open at a time.

Embedded Deployment mode for Hive Metastore

If we try to start the second session it produces an error when it attempts to open a connection to the metastore. So, to allow many services to connect the Metastore, it configures Derby as a network server. This mode is good for unit testing. But it is not good for the practical solutions.

ii. Local Metastore

Hive is the data-warehousing framework, so hive does not prefer single session. To overcome this limitation of Embedded Metastore, for Local Metastore was introduced. This mode allows us to have many Hive sessions i.e. many users can use the metastore at the same time.

We can achieve by using any JDBC compliant like MySQL which runs in a separate JVM or different machines than that of the Hive service and metastore service which are running in the same JVM.

Local Deployment mode for Hive Metastore

Local Metastore

This configuration is called as local metastore because metastore service still runs in the same process as the Hive. But it connects to a database running in a separate process, either on the same machine or on a remote machine.

Before starting Apache Hive client, add the JDBC / ODBC driver libraries to the Hive lib folder.

MySQL is a popular choice for the standalone metastore. In this case, the javax.jdo.option.ConnectionURL property is set to jdbc:mysql://host/dbname? createDatabaseIfNotExist=true, and javax.jdo.option.ConnectionDriverName is set to com.mysql.jdbc.Driver. The JDBC driver JAR file for MySQL (Connector/J) must be on Hive’s classpath, which is achieved by placing it in Hive’s lib directory.

iii. Remote Metastore

Moving further, another metastore configuration called Remote Metastore. In this mode, metastore runs on its own separate JVM, not in the Hive service JVM. If other processes want to communicate with the metastore server they can communicate using Thrift Network APIs.

We can also have one more metastore servers in this case to provide more availability. This also brings better manageability/security because the database tier can be completely firewalled off. And the clients no longer need share database credentials with each Hiver user to access the metastore database.

Remote deployment mode for Hive Metastore

Remote Metastore

To use this remote metastore, you should configure Hive service by setting hive.metastore.uris to the metastore server URI(s). Metastore server URIs are of the form thrift://host:port, where the port corresponds to the one set by METASTORE_PORT when starting the metastore server.

Databases Supported by Hive

Hive supports 5 backend databases which are as follows:

Derby
MySQL
MS SQL Server
Oracle
Postgres

So, this was all in Hive Metastore. Hope you likeour explanation.

Conclusion – Hive Metastore

In conclusion, we can say that Hive Metadata is a central repository for storing all the Hive metadata information. Metadata includes various types of information like the structure of tables, relations etc. Above we have also discussed all the three metastore modes in detail. you can also Learn the other big data technologies like Apache Hadoop, Spark, Flink etc in detail.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

Rama krishna P says:
January 6, 2020 at 9:10 pm
Your explanations are simply great. Thank you very much for sharing this much of depth information.
- DataFlair Team says:
  January 7, 2020 at 4:19 pm
  Hey Rama Krishna,
  I am glad that you liked our article. Refer our Hive data models tutorial for further learning.
Akshay Anand says:
April 10, 2020 at 7:56 pm
Hi Team,
How to install mysql connector for java 1.8.0_222 on ubuntu 14.04?
- Akshay Anand says:
  April 10, 2020 at 10:11 pm
  I did it.
  Thanks
Thomas Ajai says:
March 20, 2021 at 6:02 am
Thank you, it is very informative.
Is it possible to have same metastore service and metastore db accessible in multiple hive instances.
Sheetal says:
November 16, 2021 at 5:34 pm
How can I have access to metastore
Deva says:
August 19, 2024 at 7:13 pm
Simple and good explanation. Thanks
Manoj Kumar says:
July 19, 2025 at 10:28 pm
Very good and clear explanation.. Thank you for the post..

Hive Metastore – Different Ways to Configure Hive Metastore

What is Hive Metastore?

Hive Metastore Modes

Databases Supported by Hive

Conclusion – Hive Metastore

8 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials