HBase – Soft Introduction and Quickstart
This tutorial explains basics of HBase, and its features. Here I tried to explain functionality HBase provides and a quick start about HBase, a Basic tutorial for beginners. You will get to know where to use HBase, in which situation HBase can be useful.
What is HBase?
HBase is an open source, distributed, versioned, column-oriented, No-SQL / Non-relational database management system that runs on the top of Hadoop (To Install Hadoop Follow this installation Guide). It adds transactional capability to hadoop, allowing users to update data records.
Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.
Source: Apache
In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database.
Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object
Benefits / Functionalities HBase offers:
- Open source project (Apache)
- A sparse , three-dimensional array of cells, indexed by: RowKey, ColumnKey, Timestamp/Version
- Distributed, Reliable, large-scale data store
- Efficient at random reads/writes
- Sharded into regions along an ordered RowKey space
- Within each region: Data is grouped into column families
- Sort order within each column family: Row Key (asc), Column Key (asc), Timestamp (desc)
- Store large amounts of data
- High write throughput
- Efficient random access within large data sets
- Scale gracefully with data
- For structured and semi-structured data
- Don’t provide full RDMS capabilities (cross table transactions, joins, etc.)
After understanding basics of HBase, let’s deploy HBase on a single Node (in pseudo distributed mode)
Pre-requisites:
- Install Java
[php]$sudo apt-get install openjdk-6-jdk[/php]
- Install Hadoop (You can refer this tutorial to install Hadoop)
Install / Setup HBase on Ubuntu:
1. Download HBase
Download a stable version of HBase either from Apache or Cloudera
2. Untar Tar Ball
[php]$ tar xzf hbase-*.tar.gz[/php]
3. Set Java_Home in hbase-env.sh
[php]$ pico conf/hbase-env.sh[/php]
4. Add following entries to conf/hbase-site.xml
[php]hbase.rootdir
hdfs://localhost:8020/hbase
hbase.master
localhost:60000[/php]
5. Comment in /etc/hosts
[php]#127.0.0.1 hostname[/php]
6. Start hbase
[php]$ bin/start-hbase.sh[/php]
7. Start HBase Shell
[php]$ bin/hbase shell[/php]
HBase has been installed on your machine
Run jps command to check required daemons are running
[php]$ jps
3669 HMaster[/php]
Some basic commands of HBase
1. Create table
Create table with name ‘test’ and column family ‘cf’
[php]create ‘test’, ‘cf’[/php]
2. List tables
List all the tables
[php]list[/php]
3. Insert data
Add data into ‘test’ table
[php]put ‘test’, ‘row1’, ‘cf:a’, ‘value_1’
put ‘test’, ‘row2’, ‘cf:b’, ‘value_2’[/php]
4. Read data
Read the data from table ‘test’
[php]scan ‘test’[/php]
Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google
Superb post. I’d be very thankful if you could elaborate a little
bit further. Kudos!
Its like you read my mind! You seem to know so much about this, like
you wrote the book in it or something. I think that you can do with some pics to drive the message home a bit, but other
than that, this is wonderful blog. A great read.
I’ll certainly be back.
I used to be able to find good advice from your blog posts.
I really enjoy the article.Really looking forward to read more. Keep writing.