How security is achieved in Apache Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How security is achieved in Apache Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5943
      DataFlair TeamDataFlair Team
      Spectator

      How is security achieved in Hadoop?

    • #5945
      DataFlair TeamDataFlair Team
      Spectator

      Security can be achieved in following ways in Apache Hadoop

      1. To achieve the secure communication in hadoop we need to enable RPC/SASL.SASL/GSSAPI was used to implement Kerberos and mutually authenticate users, their processes, and Hadoop services on RPC connections. We can enable them in core-site.xml as follows

      <property>
       <name>hadoop.rpc.protection</name>
      <value>privacy</value>
      </property>

      2. Based on permissions on files to users and groups(access control) we can secure data.

      3. Enabling security module in core-site.xml as follows

      hadoop.security.authentication
      Kerberos
      hadoop.security.authorization
      true

      4. DataNodes have no concept of files or permissions , So when access to data blocks were needed, the NameNode would make an access control decision based on HDFS file permissions and would issue Data Blocksaccess tokens (using HMAC-SHA1) that could be sent to the DataNode for block access requests. in this way the connection between the HDFS permissions and access to the blocks of data is achieved.

      5.Other ways are used as thrid layer for protection as

      • Existing IT security including network firewalls, logging and monitoring, and configuration management
      • Apache Knox used for perimeter security
      • Apache Argus monitoring and management
    • #5946
      DataFlair TeamDataFlair Team
      Spectator

      Security can be achieved in following ways in Apache Hadoop

      • Enforcement of HDFS file permissions – Access control to files in HDFS could be enforced by the NameNode based on file permissions – Access Control Lists (ACLs) of users and groups.
      • Job tokens are created by the JobTracker and passed onto TaskTrackers, ensuring that Tasks could only do work on the jobs that they are assigned.
      • Tasks could also be configured to run as the user submitting the job, making access control checks simpler.
      • Network Encryption can be configured to use a Quality of Protection of confidential, enforcing encryption at the network level – this includes connections using Kerberos RPC and subsequent authentication using delegation tokens.
      • Some components of the Hadoop ecosystem have applied their own security as a layer over Hadoop for example, Apache Accumulo provides cell-level authorization, and HBase provides access controls at the column and family level.
      • Some common configurations used to provide security should be in the core-site.xml of all the nodes in the cluster:-
      Parameter				       ------               Value
      hadoop.security.authentication-----	kerberos
      hadoop.security.authorization ----	true(to enable RPC service level authorization)
      hadoop.rpc.protection 	-----		authentication
Viewing 2 reply threads
  • You must be logged in to reply to this topic.