This Hive tutorial describes how to configure Hive Metastore to MySQL. Hive stores its metadata (schema-related information, partitioning information, etc.) into the database, Hive is shipped with Derby database. Derby is an embedded database backed by local disk. Derby is a single threaded database which doesn’t allow multiple connections, it is not production ready. In this tutorial, we will change the Metastore of Hive to MySQL.
2. Hive Introduction
Apache Hive is a data warehouse on the top of Hadoop. Using Hive we can run ad-hoc queries for the analysis of data. Hive saves us from writing complex Map-Reduce jobs, rather than that we can submit merely SQL queries. Hive converts SQL queries into MapReduce job and submits the same to the cluster.
Hive is very fast and scalable and is highly extensible. The hive consists of a huge user base, with the help of Hive thousands of jobs on the cluster can be run by hundreds of users at a time. As Hive is similar to SQL, hence it becomes very easy for the SQL developers to learn and implement Hive Queries.
3. Steps to Configure Hive Metastore to MySQL
Follow the steps given below to easily configure Hive Metastore to MySQL-
3.1. Install MySQL
$sudo apt-get install mysql-server
3.2. Copy MySQL connector to lib directory
Download MySQL connector (mysql-connector-java-5.1.35-bin.jar) and copy it into the $HIVE_HOME/lib directory
Note: $HIVE_HOME refers hive installation directory
3.3. Edit / Create configuration file hive-site.xml
Add following entries in the hive-site.xml (present in $HIVE_HOME/conf)
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/hcatalog?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>your_username</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>your_password</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
Now start hive terminal, it will be connected to MySQL. Now you can open multiple hive connections, which was not possible with Derby database.