What are the limitations of Hadoop?

Viewing 1 reply thread
  • Author
    Posts
    • #5169
      DataFlair TeamDataFlair Team
      Spectator

      What are the disadvantages of Hadoop?
      What are the Shortcomings in Hadoop?

    • #5170
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop emerged as a solution to the “Big Data”problems. It is an open source software framework for distributed storage and distributed processing of large data sets. Open source means it is freely available and even we can change its source code as per our requirements.

      Limitations of Hadoop are:

      1) Issue with small files- Hadoop is not suited for small files. Small files are the major problems in HDFS. A small file is significantly smaller than the HDFS Blocksize (default 128MB ). If you are storing these large number of small files, HDFS can’t handle these lots of files. As HDFS works with a small number of large files for storing data sets rather than larger number of small files. If one use the huge number of small files, then this will overload the namenode. Since namenode stores the namespace of HDFS.
      HAR files, Sequence files, and Hbase overcome small files issues.

      2) Processing Speed- With parallel and distributed algorithm, MapReduce process large data sets. MapReduce performs the task: Map and Reduce. MapReduce requires a lot of time to perform these tasks thereby increasing latency. As data is distributed and processed over the cluster in MapReduce. So, it will increase the time and reduces processing speed.

      3) Support only Batch Processing- Hadoop supports only batch processing. It does not process streamed data and hence, overall performance is slower. MapReduce framework does not leverage the memory of the cluster to the maximum.

      4) Iterative Processing- Hadoop is not efficient for iterative processing. As hadoop does not support cyclic data flow. That is the chain of stages in which the input to the next stage is the output from the previous stage.

      5) Vulnerable by nature- Apache Hadoop is entirely written in Java, a language most widely used and Java been most heavily exploited by cyber-criminal. Therefore it implicates in numerous security breaches.

      6) Security- Hadoop can be challenging in managing the complex application. Hadoop is missing encryption at storage and network levels, which is a major point of concern. Hadoop supports Kerberos authentication, which is hard to manage.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.