HBase Security: Kerberos Authentication & Authorization
1. Objective – HBase Security
Today, we will learn HBase Security. So, In this article “HBase Security: Authentication & Authorization”, we will learn the way we use Kerberos with Hadoop and HBase to offer User Authentication i.e. HBase Kerberos Authorization.
Also, the implementation of HBase Authorization to grant users permissions for particular actions on a specified set of data. Moreover, we will cover some HBase Commands for security in HBase. Along with this, we will discuss HDFS & Zookeeper SASL and also HBase ACL. At last, we will know about HBase Simple Authentication & HBase Client Authentication.
So, let’s explore the HBase Security tutorial.
2. HBase Security: Authentication & Authorization
Basically, protection of HBase against sniffers, unauthenticated/unauthorized users and network-based attacks is what we meant by term “HBase Security”. However, it can not protect against authorized users especially those who accidentally delete all the data.
But it is possible to configure HBase, to provide User Authentication.
However, that ensures that only authorized users can communicate with HBase. Moreover, on the basis of HBase Simple Authentication and Security Layer (SASL), the HBase authorization system is implemented at the RPC level, that supports Kerberos. Further, on a per connection basis, SASL allows authentication, encryption negotiation and/or message integrity verification.
Have a look at HBase Pros and Cons
After enabling User Authentication, the next step is to give an admin the ability to define a series of User Authorization rules which allow or deny particular actions. Access Controller Coprocessor or Access Control List (ACL), which is the second name of the Authorization system, is available from HBase 0.92 (CDH4) onward. It provides the ability to define the authorization policy (Read/Write/Create/Admin), with table/family/qualifier granularity, for a specified user.
3. Kerberos in HBase Security
A networked authentication protocol is what we call Kerberos. Basically, by using secret-key cryptography, it offers strong authentication for client/server applications. To help a client to prove its identity to a server (and vice versa) across an insecure network connection, the HBase Kerberos protocol uses strong cryptography (AES, 3DES, …).
A client and server can also encrypt all of their communications to assure privacy and data integrity as they go about their business if they have used Kerberos to prove their identities.
i. Ticket exchange protocol
There are 3 steps which must follow to access a service using HBase Kerberos, at a high level:
a. Kerberos Authentication
At very first, the HBase client authenticates itself to the Kerberos Authentication Server. Afterward, it receives a Ticket Granting Ticket (TGT).
b. Kerberos Authorization
Then from the Ticket Granting Server, client request a service ticket, so if the client TGT sent with the request is valid, that issues a ticket and a session key.
c. Service Request
Further, to authenticate client uses the service ticket itself to the server which is providing the service the client is using (e.g. HDFS, HBase, …)
4. HBase, HDFS, ZooKeeper SASL
As we know, secure HBase relies on a secure HDFS and a secure ZooKeeper, because HBase depends on HDFS and ZooKeeper. That says to communicate with HDFS and ZooKeeper, the HBase servers need to create a secure service session.
Further, in HDFS, all the files written by HBase are stored. Moreover, the access control provided by HDFS is based on users, groups, and permissions, as in Unix filesystems.
On each znode, ZooKeeper has an Access Control List (ACL) which permits read/write access to the users on the basis of user information in a similar manner to HDFS.
5. HBase ACL
Basically, we are sure that the username that we received is one of our trusted users only if our users are authenticated via Kerberos. However, there are times when this is not enough granularity like when we want to control that a specified user is able to read or write a table so, HBase offers an Authorization mechanism which allows restricted access for specific users, to do that.
However, we must enable the Access Controller coprocessor, to enable this feature. It is possible by adding it to hbase-site.xml under the master and region server coprocessor classes. On defining a coprocessor, it is a code which runs inside each HBase Region Server and/or Master.
- Rights management and _acl_ table
To manage the user rights, the HBase shell has a couple of commands which permits an admin:
grant [table] [family] [qualifier] revoke [table] [family] [qualifier]
Also, an admin can restrict user access on the basis of table schema:
- Provide user-W only read rights to Table-X/Family-Y
(grant 'User-W', 'R', 'Table-X', 'Family-Y')
- And, to user-W, the full read/write rights to Qualifier-Z
(grant 'User-W', 'RW', 'Table-X', 'Family-Y', 'Qualifier-Z')
Furthermore, to operate at the cluster level, an admin can easily grant the global right, for example balancing regions, creating tables, shutting down the cluster and many more:
- In order, to grant user-W the ability to create tables
(grant 'User-W', 'C')
- And, to give user-W the ability to manage the cluster
(grant 'User-W', 'A')
However, in a table created by the Access Controller coprocessor, called _acl_, all the permissions are stored. The table name that we specify in the grant command, is the primary key of this table. Here, the _acl_ table has only one column family. Whereas, for a particular table/user each qualifier describes the granularity of rights.
- Access Controller under the hood
To intercept each user request, the Access Controller coprocessor uses the ability. Also, it checks, whether the user has the rights to execute the operations or not. To see if the user has the rights to execute the operation, the Access Controller needs to query the _acl_ table, for each operation.
Although, it is possible, that this operation may leave the negative impact on performance. So, to fix this problem, there is one solution we have that is we can use the _acl_ table for persistence and ZooKeeper in order to speed up the rights lookup.
6. Commands for HBase Security Purpose
This command grants specific rights for example read, write, execute, and admin on a table to a certain user.
The Syntax for Security Purpose:
hbase> grant <user> <permissions> [<table> [<column family> [<column; qualifier>]]
From the set of RWXCA, we can grant zero or more privileges to a user. RWXCA refers to:
R – Here R represents read privilege.
W – And, W represents write privilege.
X – Here X represents execute privilege.
C – Now C refers to create privilege.
A – And, A means admin privilege.
Here we are granting all the privileges to a user named ‘Dataflair’.
hbase(main):018:0> grant 'Dataflair', 'RWXCA'
To revoke a user’s access rights of a table, we use the revoke command is used.
The Syntax for Revoke:
hbase> revoke <user>
Below code revokes all the permissions from the user named ‘Dataflair’.
hbase(main):006:0> revoke 'Dataflair'
In order to list all the permissions for a particular table, we use this command.
The Syntax for user permission:
Now, below code lists all the user permissions of ‘emp’ table.
hbase(main):013:0> user_permission 'emp'
So, this was all about HBase Security. Hope you like our explanation.
7. Conclusion: HBase Security
Hence, in this HBase security, we have seen how to use Kerberos to authenticate users and encrypt communications between services. Also, Security in HBase adds two extra features which permit us to protect our data against sniffers or other network attacks. Moreover, we have seen all possible HBase commands we can use for HBase Security purpose. Still, if any doubt regarding HBase Security, ask in the comment tab.
See also –
HBase MapReduce Integration